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The present invention relates to a method of providing 
novel DNA sequences encoding a polypeptide with an activity 
of interest, comprising the following steps: i) PCR amplifi- 
cation of said DNA with PGR primers with homology to (a) 
known gene(s) encoding a polypeptide with an activity of inter- 
est, it) linking the obtained PCR product to a 5' structural gene 
sequence and a 3' structural gene sequence, Hi) expressing said 
resulting hybrid DNA sequence, iv) screening for hybrid DNA 
sequences encoding a polypeptide with said activity of interest 
or related activity, v) isolating the hybrid DNA sequence iden- 
tified in step iv). Further, the invention also relates novel DNA 
sequences provided according to the method of the invention 
and polypeptides with an activity of interest encoded by said 
novel DNA sequences of the invention. 
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Title: Method of providing novel DNA sequences 

FIELD OP THE INVENTION 

The present invention relates to a method of providing novel 
DNA sequences encoding a polypeptide with an activity of 
5 interest, novel DNA sequences provided according to the method of 
the invention, polypeptides with an activity of interest encoded 
by novel DNA sequences of the invention. 

BACKGROUND OF THE INVENTION 

10 The advent of recombinant DNA techniques has made it possible 
to select single protein components with interesting properties 
and produce them on a large scale. This represents an improvement 
over the previously employed production process using microorga- 
nisms isolated from nature and producing a mixture of proteins 

15 which would either be used as such or separated after the produc- 
tion step. 

Since the traditional methods were rather time-consuming, more 
rapid and less cumbersome methods were developed. 

A such technique is described in WO 93/11249 (Novo Nordisk 
20 A/S) . 

The method described in WO 93/11249 comprises the steps of: 

a) cloning, in suitable vectors, a DNA library from an organism 
suspected of producing one or more proteins of interest; 

b) transforming suitable yeast host cells with said vectors; 

25 c) culturing the host cells under suitable conditions to express 
any protein of interest encoding by a clone in the DNA library; 
and 

d) screening for positive clones by determining any activity of 
a protein expressed in step c) . 

3 0 According to this method it is necessary to prepare a DNA 
library, comprising complete genes encoding polypeptides with 
activities of interest. Such a library has traditionally been 
made on mRNA isolated from micro-organisms which has been 
cultivated and isolated. 

35 As it is only possible with known methods to cultivate about 
2% of the microorganisms known today (i.e. cultivable microorga- 
nisms) , genes encoding polypeptides from a huge number of 
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microorganisms (i.e. un-cultivable microorganisms) are generally 
difficult to identify and clone on the basis of screening tech- 
nologies used today , such as the above mentioned. 

5 SUMMARY OF THE INVENTION 

It is the object of the present invention to provide a method 
for providing a novel DNA sequence encoding a polypeptide with an 
activity of interest from micro-organisms without having to cul- 
tivate and isolate said micro-organisms. 
10 In the first aspect the invention relates to a method of 
providing novel DNA sequences encoding a polypeptide with an 
activity of interest, comprising the following steps: 

i) PCR amplification of said DNA with PCR primers with homology 
to (a) known gene(s) encoding a polypeptide with an activity of 

15 interest, 

ii) linking the obtained PCR product to a 5 1 structural gene 
sequence and a 3* structural gene sequence, 

iii) expressing said resulting hybrid DNA sequence, 

iv) screening for hybrid DNA sequences encoding a polypeptide 
20 with said activity of interest or related activity, 

v) isolating the hybrid DNA sequence identified in step iv) 
Further, the invention also relates novel DNA sequences 

provided according to the method of the invention and 
polypeptides with an activity of interest encoded by said novel 
25 DNA sequences of the invention. 

BRIEF DESCRIPTION OF THE DRAWING 

Figure 1 shows the cloning strategy of novel hybrid enzyme 
sequences . 

3 0 a is an exact N-terminal consensus primer 

a rc is the reverse and complement primer to a 
b is a degenerated homologous N-terminal primer 
c is a degenerated homologous C-terminal primer 
d is an exact C-terminal consensus primer 

35 d rc is a reverse and complement of d 

f is an exact reverse and complement C-terminal primer extended 
with a sequence which includes a Sail restriction recognition 
site. 
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e is an exact N-terminal primer extended with a sequence which 
includes an EcoRI restriction recognition site. 
1. (in figure 1) 

PCR with primers ab and cd to amplify unknown core genes with 
5 an activity of interest. 

PCR with primers e and a rc to obtain the N-terminal part of 
the known gene. 

PCR with primers d rc and f to obtain the C-terminal part of the 

known gene . 
10 2. (in figure 1) 

SOE-PCR with primers e and f to link the unknown core gene 

sequence with the known N- and C-terminal gene sequences and 

introduction of EcoRI and Sail restriction recognition sites. 

3. Restriction enzyme digestion followed by ligation of the 
15 novel sequence into an expression vector and transformation 

into a host cell. Screening of clones expressing the produced 

gene product with the activity of interest. 

Figure 2 shows a part of an alignment of prokaryote 

xylanases belonging to glycosyl hydrolases family 11. 
20 Figure 3 shows an alignment of the translated DNA 

sequences of Pulpzyme® (SEQ ID NO 2) and the novel gene 

sequence found in soil, respectively. 

Figure 4 shows a schematically a novel hybrid gene provided 

according to the invention. Part A and Part C are the known 
25 sequences linked to the unknown Part B. 

Using Pulpzyrae® (SEQ ID NO 1) as the starting sequence: 

"1" indicated the first nucleotide of the novel hybrid gene 

provided according to the invention, "433" and "631" the start 

and end of the part constituted by the unknown gene sequence 
30 and "741" the last nucleotide of the novel hybrid gene 

sequence . 
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DEFINITIONS 

Prior to discussing this invention in further detail, the 
following terms will first be defined. 

"Homology of DNA sequences or polynucleotides" In the 
5 present context the degree of DNA sequence homology is determined 
as the degree of identity between two sequences indicating a 
derivation of the first sequence from the second. The homology 
may suitably be determined by means of computer programs known in 
the art, such as GAP provided in the GCG program package (Program 
10 Manual for the Wisconsin Package, Version 8, August 1994 , Gene- 
tics Computer Group, 575 Science Drive, Madison, Wisconsin, USA 
53711) (Needleman, S.B. and Wunsch, CD., (1970), Journal of 
Molecular Biology, 48, 443-453) . 

"Homologous": The term "homologous" means that one single- 
15 stranded nucleic acid sequence may hybridize to a complementary 
single-stranded nucleic acid sequence. The degree of hybridiza- 
tion may depend on a number of factors including the amount of 
identity between the sequences and the hybridization conditions 
such as temperature and salt concentration as discussed later 

2 0 (vide Infra) . 

Using the computer program GAP (vide supra) with the 
following settings for DNA sequence comparison: GAP creation 
penalty of 5.0 and GAP extension penalty of 0.3, it is in the 
present context believed that two DNA sequences will be able to 
25 hybridize (using low stringency hybridization conditions as 
defined below) if they mutually exhibit a degree of identity 
preferably of at least 70%, more preferably at least 80%, and 
even more preferably at least 85%. 

"heterologous": If two or more DNA sequences mutually 

3 0 exhibit a degree of identity which is less than above specified, 

they are in the present context said to be "heterologous". 

"Hybridization:" Suitable experimental conditions for 
determining if two or more DNA sequences of interest do hybri- 
dize or not is herein defined as hybridization at low 
3 5 stringency as described in detail below. 

A suitable experimental low stringency hybridization pro- 
tocol between two DNA sequences of interest involves pre- 
soaking of a filter containing the DNA fragments to hybridize 
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in 5 x SSC (Sodium chloride/ Sodium citrate, Sambro k et al. 
1989) for 10 min, and prehybridization of the filter in a 
solution of 5 x SSC, 5 x Denhardt's solution (Sambrook et al* 
1989), 0.5 % SDS and 100 /ig/ml of denatured sonicated salmon 
5 sperm DNA (Sambrook et al. 1989), followed by hybridization in 
the same solution containing a concentration of lOng/ml of a 
random-primed (Feinberg, A. P. and Vogelstein, B. (1983) Anal. 
Bloch&m. 132:6-13), 32 P-dCTP-labeled (specific activity > 1 x 
10 9 cpm//ig ) probe (DNA sequence) for 12 hours at ca. 4 5°C. 

10 The filter is then washed twice for 30 minutes in 2 x SSC, 0.5 
% SDS at least 50°C, more preferably at least 55 °C, and even 
more preferably at least 60 °C (high stringency) . 

Molecules to which the oligonucleotide probe hybridizes 
under these conditions are detected using a x-ray film. 

15 "Alignment": The term "alignment" used herein in connection 

with a alignment of a number of DNA and/or amino acid sequences 
means that the sequences of interest is aligned in order to 
identify mutual/common sequences of homology/ identity between the 
sequences of interest. This procedure is used to identify common 

20 "conserved regions" (vide infra) , between sequences of 
interest. An alignment may suitably be determined by means of 
computer programs known in the art, such as ClusterW or PILEUP 
provided in the GCG program package (Program Manual for the 
Wisconsin Package, Version 8, August 1994, Genetics Computer 

25 Group, 575 Science Drive, Madison, Wisconsin, USA 
53711) (Needleman, S.B. and Wunsch, CD., (1970), Journal of 
Molecular Biology, 48, 443-453). 

"Conserved regions:" The term "conserved region" used herein 
in connection with a "conserved region" between DNA and/or 

30 amino acid sequences of interest means a mutual common sequence 
region of the sequences of interest, wherein there is a 
relatively high degree of sequence identity between the 
sequences of interest. In the present context a conserved 
region is preferably at least 10 base pairs (bp)/ 3 amino 

3 5 acids (a. a), more preferably at least 20 bp/ 7 a. a., and even 
more preferably at least 30 bp/ 10 a. a.. 

Using the computer program GAP (Program Manual for the 
Wisconsin Package, Version 8, August 1994, Genetics Computer 
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Group, 575 Science Drive, Madison, Wisconsin, USA 
53711) (Needleman, S.B. and Wunsch, CD., (1970), Journal of 
Molecular Biology, 48, 443-453) (vide supra) with the following 
settings for DNA sequence comparison: GAP creation penalty of 

5 5.0 and GAP extension penalty of 0.3, the degree of DNA 
sequence identity within the conserved region is preferably of 
at least 80%, more preferably at least 85%, more preferably at 
least 90%, and even more preferably at least 95%. 

"Sequence overlap extension PCR reaction (SOE-PCR)": The term 

.0 "SOE-PCR" is a standard PCR reaction protocol known in the art., 
and is in the present context defined and performed according to 
standard protocols defined in the art ("PCR A practical approach" 
IRL Press, (1991) ) . 

"primer": The term "primer" used herein especially in 

5 connection with a PCR reaction is an oligonucleotide (especially 
a "PCR-primer") defined and constructed according to general 
standard specification known in the art ("PCR A practical ap- 
proach" IRL Press, (1991)). 

"A primer directed to a sequence:" The term "a primer 

0 directed to a sequence" means that the primer (preferably to be 
used in a PCR reaction) is constructed so it exhibits at least 
80% degree of sequence identity to the sequence part of interest, 
more preferably at least 90% degree of sequence identity to the 
sequence part of interest, which said primer consequently is 

5 "directed to". The primer is designed in order to specifically 
anneal at the region at a given temperature it is directed to- 
wards. Especially identity at the 3 1 end of the primer is essen- 
tial for the function of the polymerase, i.e. the ability of a 
polymerase to extend the annealed primer. 

0 "Polypeptide" Polymers of amino acids sometimes referred 

to as protein. The sequence of amino acids determines the 
folded conformation that the polypeptide assumes, and this in 
turn determines biological properties such as activity. Some 
polypeptides consist of a single polypeptide chain (monomeric) , 

5 whilst other comprise several associated polypeptides (multime- 
ric) . All enzymes and antibodies are polypeptides. 

"Enzyme" A protein capable of catalysing chemical 
reactions. Specific types of enzymes are a) hydrolases 
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including amylases, cellulases and other carbohydrases, 
proteases, and lipases, b) oxidoreductases, c) Ligases, d) 
Lyases, e) Isomerases, f) Transferases, etc. Of specific 
interest in relation to the present invention are enzymes used 
5 in detergents, such as proteases, lipases, cellulases, 
amylases, etc* 

"known sequence" is the term used for the DNA sequences of 
which the full length sequence has been sequenced or at least the 
sequence of one conserved regions is known. 

10 "unknown sequence" is the term used for the DNA sequences 

amplified directly from uncultivated micro-organisms comprised in 
e.g. a soil sample used as the starting materia. "Full length 
DNA sequence" means a structural gene sequence encoding a 
complete polypeptide with an activity of interest. 

15 "un-cultivated" means that the micro-organism comprising 

the unknown DNA sequence need not be isolated (i.e. to provide 
a population comprising only identical micro-organisms) before 
amplification (e.g. by PCR) . 

The term "an activity of interest" means any activity for 

20 which screening methods is known. 

The term "un-cultivable micro-organisms" defined micro-orga- 
nisms which can not be cultivated according to methods know in 
the art. 

The term "DNA" should be interpreted as also covering other 
25 polynucleotide sequences including RNA. 

The term "linking" sequences means effecting a covalent 
binding of DNA sequences. 

The term "hybrid sequences" means sequences of different 
origin merged together into one sequence. 
30 The term "structural gene sequence" means a DNA sequence 
coding for a polypeptide with an activity. 

The term "natural occurring DNA" means DNA, which has not 
been subjected to biological or biochemical mutagenesis. By 
biological mutagenesis is meant "in vivo" mutagenesis, i.e. 
35 propagation under controlled conditions in a living organism, 
such as a "mutator" strain, in order to create genetic 
diversity. By biochemical mutagenesis is meant "in vitro" 
mutagenesis, such as error-prone PCR, oligonucleotide directed 
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site-specific or random mutagenesis etc. 

DETAILED DESCRIPTION OF THE INVENTION 

It is the object of the present invention to provide a method 
5 for providing novel DNA sequences encoding polypeptides with an 
activity of interest from micro-organisms without having to 
cultivate said micro-organisms. 

The inventors of the present invention have found that PCR- 
screening using primers designed on the basis of known 
10 homologous region, such as conserved regions, can be used for 
providing novel DNA sequences. Despite the fact that known 
homologous regions, such as conserved regions, are used for 
primer designing a vast number of unknown DNA sequences have been 
provided. This will be described in the following and illustrated 
15 in the Examples. 

The DNA sequences provided are full length hybrid structural 
gene sequences encoding complete polypeptides with an activity of 
interest made up of one unknown sequence and one or two known 
sequences . 

20 According to the invention it is essential to identify at 
least two homologous regions, such as conserved regions, in known 
gene sequences with the activity of interest. One or two selected 
known structural gene sequence(s) is (are) used as templates (i.e. 
as starting sequence (s) ) for finding and constructing novel DNA 

25 structural gene sequences with an activity of interest. 

Said homologous regions, such as conserved regions, can be 
identified by alignment of polypeptides with the activity of 
interest and may e.g. be made by the computer program ClustalW 
or other similar programs available on the market. 

30 

One known structural gene as the starting sequence 

In the case of using one known structural gene sequence as the 
starting sequence it will typically be comprised in a plasmid or 
vector or the like. A part of the sequence between the two 
35 identified homologous regions, such as conserved regions, are 
deleted to avoid contamination by the wild-type structural gene. 

The known DNA sequence, with the homologous regions, such as 
conserved regions, placed at the ends, are linked to an unknown 
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DNA sequence amplified directly or indirectly from a sample 
comprising micro-organisms. 

The identified homologous regions, such as conserved regions, 
must have a suitable distance from each other, such as 10 or more 
5 base pairs in between. It is preferred to use homologous regions, 
such as conserved regions, placed in each end of the known 
structural full length gene. 

However, if knowledge about a specific function (e.g. active 
site) of a domain (i.e. part of the structural gene sequence) is 
10 available it may be advantageous to used conserved regions placed 
in proximity of and on each side said domain as basis for the PCR 
amplification to provide novel DNA sequences according to the 
invention which will be described below in details. 

15 Two known genes as starting sequences 

In the case of using two known structural genes as the stating 
sequences at least one homologous region, such as conserved 
region, should be identified in each of the two sequences within 
the polypeptide coding region. 

20 In both case (i.e. one or two known genes as starting 
sequences) the homologous regions, such as conserve regions, 
should preferably be situated at each end of the structural 
gene(s) (i.e. the sequences encoding the N-terminal end (i.e. 
named Part A on figure 4) and the C-terminal end, respectively 

25 (i.e. named Part C on figure 4) of the known part of the hybrid 
polypeptide 

In the first aspect the invention relates to a method for 
providing novel DNA sequences encoding polypeptides with an 
activity of interest comprises the following steps: 
30 i) PCR amplification of said DNA with PCR primers with homology 
to (a) known gene(s) encoding a polypeptide with an activity of 
interest, 

ii) linking the obtained PCR product to a 5* structural gene 
sequence and a 3' structural gene sequence, 
35 iii) expressing said resulting hybrid DNA sequence, 

iv) screening for hybrid DNA sequences encoding a polypeptide 
with said activity of interest or related activity, 
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v) isolating the hybrid DNA sequence identified in step iv) 

In step i) the part between the corresponding homologous 

regions, such as conserved regions, of the unknown structural 

gene are amplified. 
5 In an embodiment the PCR amplification in step i) is performed 

using naturally occurring DNA or RNA as template. 

In anither embodiment the micro-organism has not been 

subjected to "in vitro" selection. 

The PCR amplification may be performed on a sample containing 
10 DNA or RNA from un-isolated micro-organisms. According to the 

invention no prior knowledge about the unknown sequence is 

required. 

In an embodiment of the invention said 5 1 and 3 1 structural 
gene sequences originate from two different known structural gene 

15 sequences encoding polypeptides having the same activity or 
related activity. 

The 5 1 structural gene sequence and the 3* structural gene 
sequence may also originate from the same known structural gene 
encoding a polypeptide with the activity of interest or from two 

20 different known structural gene sequences encoding polypeptides 
having different activities. In the latter case it is preferred 
that at least one of the starting sequences originates from a 
known structural gene sequence encoding a polypeptide with the 
activity of interest. 

25 In a preferred embodiment of the method of the invention the 
known structural gene is situated in a plasmid or a vector. In 
said case the method comprises the following steps: 

i) PCR amplification of DNA from micro-organisms with 
PCR primers being homologous to conserved regions of 

30 a known gene encoding a polypeptide with an activity of 
interest , 

ii) cloning the obtained PCR product into a gene encoding 
a polypeptide having said activity of interest, where 
said gene is not identical to the gene from which the 

3 5 PCR product is obtained, which gene is situated in an 

expression vector, 

iii) transforming said expression vector into a suitable 
host cell, 
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iiia) culturing said host cell under suitable conditions, 

iv) screening for clones comprising a DNA sequence 
originated from the PCR amplification in step i) 
encoding a polypeptide with said activity of 

5 interest or a related activity, 

v ) isolating the DNA sequence identified in step iv) . 
According to this embodiment one known structural gene 

sequence is used as the starting sequence. It is to be understood 
that the PCR product obtained in step i) is cloned into a known 

10 gene where a part of the DNA sequence, between the conserved 
regions, is deleted (i.e. cut out) or in an other way substituted 
with the PCR product. The deleted part of the known gene 
comprised in the vector may have any suitable size, typically 
between 10 and 5000 bp, such as from between 10 to 3000 bp. 

15 A general problem is that, when amplifying DNA sequences 
encoding polypeptides with an activity by PCR, the obtained PCR 
product (i.e. being a part of an unknown gene) does not normally 
encode a polypeptide with the desired activity of interest. 

Therefore, according to the invention the complete full length 

20 structural gene, encoding a functional polypeptide, is provided 
by cloning (i.e. by substituting) the PCR product of the unknown 
structural gene into the known gene situated on the expression 
vector . 

It should be emphasised that the DNA mentioned in step i) , to 
25 be PCR amplified, need not to comprise a complete gene encoding a 
functional polypeptide. This is advantageous as only a smaller 
region of the DNA of the micro-organism (s) in question need to be 
amplified. 

The novel DNA sequences obtained according to the invention 
3 0 consist of the PCR product merged or linked into the known gene, 
having a number of nucleotides between the conserved regions 
deleted. The PCR product is inserted into the known gene between 
the two ends of the cut open vector by overlapping homologous 
regions of about 10 to 200 bp at each end of the vector. 
35 The resulting novel hybrid DNA sequences constitute complete 
full length genes comprising the PCR product and encodes a 
polypeptide with the activity of interest. 
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It is to be understood that it is not absolutely necessary to 
delete a part of the known gene sequence. However, if a part of 
the known gene sequence is not deleted re-ligation results in 
that the wild-type activity of the known gene is regained and 
5 thus give a high number of wild-type background clones, which 
would make the screening procedure more time consuming and 
cumbersome . 

The PCR amplification in step i) can be performed on both 
cultivable and uncultivable micro-organisms by directly or 
10 indirectly amplification of DNA from the genomic material of the 
micro-organisms in the environment (i.e. directly or indirectly 
from the sample taken) . 

The micro-oraanisms 

15 The micro-organisms from which the unknown DNA sequences are 
derived may be micro-organisms which cannot today be cultivated. 
This is possible as the DNA sequences can be amplified by PCR 
without the need first to cultivate and isolate the micro- 
organisms comprising the unknown DNA sequence (s) . 

20 It is however to be understood that the method of the 
invention can also be used for providing novel DNA sequences 
derived from micro-organisms which can be cultivated. 

Therefore the method of the invention can be performed on both 
cultivable and un-cultivable organisms as the micro-organisms in 

25 question do not, according to the method of the invention, need 
to be cultivated and isolated from, e.g. the soil sample, 
comprising micro-organisms. 

Starting material 

30 The starting material, i.e. the sample comprising micro- 
organisms with the target unknown DNA sequences, may for instance 
be an environmental samples of plant or soil material, animal or 
insect dung, insect gut, animal stomach, a marine sample of sea 
or lake water, sewage, waste water, etc., comprising one or, as 

3 5 in most case, a vast number of different cultivable and/or un- 
cultivable micro-organisms. 

If the genomic material of the micro-organisms are readily 
accessible the PCR amplification may be performed directly on the 
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sample. In other cases a pre-purif ication and isolation procedure 
of the genomic material is needed. 

Smalla et al. (1993), J. Appl. Bacterid 74, p. 78-85; Smalla 
et al. (1993), FEMS Microbiol Ecol 13, p. 47-58, describes how to 
5 extract DNA directly from micro-organisms in the environment 
(i.e. the sample). 

Borneman et al. (19 96), Applied and Environmental 
Microbiology, 1935-1943, describes a method for extracting DNA 
from soils. 

ID A commercially available kit for isolating DNA from 

environmental samples, such as e.gr. soils, can be purchased from 

BIO 101 under the tradename FastDNA® SPIN Kit. 

Seamless™ Cloning kit (cataloge no. Stratagene 214400) is a 

commercial kit suitable for cloning of any DNA fragment into any 
15 desired location e.g. a vector, without the limitation of 

naturally occurring restriction sites. 

PCR amplification of DNA and/or FNA of micro-organisms in the 

environment is described by Erlich, (1989), PCR Technology. 

Principles and Applications for DNA Amplification, New 
20 York/London, Stockton Press; Pillai, et al. , (1991), Appl. 

Environ. Microbiol, 58, p. 2712-2722) 

Other methods for PCR amplifying microbial DNA directly from a 

sample is described in Molecular Microbial Ecology Manual, 

(1995), Edited by Akkermans et al.. A suitable method for 
25 microbial DNA from soil samples is described by Jan Dirk van 

Elsas et al., (1995), Molecular Microbial Ecology Manual 2.7.2, 

p* 1-10. 

Stein et al., (1996), J. Bacterid., Vol. 178, No. 2, p. 591- 
599, describes a method for isolating DNA from un-cultivated 
3 0 prokaryotic micro-organisms and cloning DNA fragments therefrom. 

The PCR primers being homologous to conserved regions of the 
known gene encoding a polypeptide with an activity of interest 
are synthesized according to standard methods known in the art 
3 5 (see for instance EP 684 313 from Hoffmann-La Roche AG) on the 
basis of knowledge to conserved regions in the polypeptide with 
the activity of interest. 
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Said PCR primers may be identical to at least a part of the 
conserved regions of the known gene. However, said primers may 
advantageously be synthisized to differ in one or more positions. 

Further, a number of different PCR primers homologous to the 
5 conserved regions may be used at the same time in step i) of the 
method of the invention. 

The cultivable or uncultivable micro-organisms may be both 
prokaryotic organisms such as bacteria, or eukaryotic organisms 
including algae, fungi and protozoa. 
10 Examples of un-cultivable organisms include, without being 
limited thereto, extremophiles and plantonic marine organisms 
etc. 

The group of cultivable organisms include bacteria, fungal 
organisms, such as filamentous fungi or yeasts. 

15 In the case of using DNA from cultivable organisms the PCR 
amplif ication in step i) may be performed on one or more 
polynucleotides comprised in a vector, plasmid or the like, such 
as on a cDNA library. 

Specific examples of "an activity of interest" include enzyma- 

20 tic activity and anti-microbial activity. 

In a preferred embodiment of the invention the activity of 
interest is an enzymatic activity, such as an activity selected 
from the group comprising of phosphatases oxidoreductases (E.C. 
1), transferases (E.C. 2); hydrolases (E.C. 3), such as esterases 

25 (E.C. 3.1), in particular lipases and phytase; such as 
glucosidases (E.C. 3.2), in particular xylanase, cellulases, 
hemicellulases, and amylase, such as peptidases (E.C. 3.4), in 
particular proteases; lyases (E.C. 4); isomerases (E.C. 5) ; 
ligases (E.C. 6) . 

30 The host cell used in step iii) may be any suitable cell which 
can express the gene encoding the polypeptide with the activity 
of interest. The host cells may for instance be a yeast, such as 
a strain of Saccharomyces , in particular saccharomyces 
cerevisiae, or a bacteria, such as a strain of Bacillus, in 

35 particular of Bacillus subtllis, or a strain Escherichia coll. 

Clones found to comprise a DNA sequence originated from the 
PCR amplification in step i) may be screened for any activity of 
interest. Examples of such activities include enzymatic activity, 
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anti-microbial activity or biological activities* 

The polypeptide with the activity of interest may then be 
tested for a desired performance under specific conditions and/or 
in combination with e.g. chemical compounds or agent. In the case 
5 where the polypeptide is an enzyme e.g. the wash performance, 
textile dyeing, hair dyeing or bleaching properties, effect in 
feed or food may be assayed to identify polypeptides with a de- 
sired property. 

10 Identification of conse rved regions of prokarvote xvlanases 

Figure 2 shows an alignment of prokaryote xylanases from 
the family 11 of glycosyl hydrolases (B. Henrissat, Biochem J, 
280:309-316 (1991)). There are several region where the amino 
acids are identical or almost identical, i.e. conserved 

15 regions. 

Examples of homologous regions or conserved regions in 
prokaryotic xylanases from family 11 of glycosyl hydrolases (B. 
Henrissat, (1991), Biochem J 280:309-316) are the sequence 
"DGGTYDIY" (SEQ ID NO 3 ) position 145-152, "EGYQSSG" (SEQ ID 
20 NO. 4) position 200-206 in the upper polypeptide shown in 
figure 2. 

Based on e.g. said regions degenerated PCR primers can be 
designed. These degenerated PCR primers can amplify unknown DNA 
sequences coding for polypeptides (i.e. referred to as PCR 
2 5 products below) which are homologous to the known 
polypeptide (s) in question (i.e. SEQ ID NO 2) flanked by the 
conserved regions. 

The PCR products obtained can be cloned into a plasmid and 
sequenced to check if they contain conserved regions and are 
30 homologous to the known structural gene sequence (s) . 

A homologous PCR product is however not a guarantee that 
the sequence code for a part of a polypeptide having the 
desired activity of interest. 

Therefore, according to the method of the invention one or 
35 more steps selecting DNA sequences encoding polypeptides having 
the activity of interest follow the construction of the novel 
hybrid DNA sequences. 
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The unknown DNA sequences 

When method of the invention is performed on DNA from 
samples of uncultivated organisms it is advantageous to screen 
5 for gene products with the activity of interest. 

A suitable method for doing this is to link the PGR 
products with a 5 1 sequence upstream the first conserved region 
DNA sequence and the 3 1 sequence downstream the second consen- 
sus, respectively, from the known gene sequence. 
10 The product of the unknown gene sequence linked to an N- 

terminai and c- terminal part of a known gene product is then 
screened for the activity of interest. 

The N-terminal and C-terminal parts can originate from the 
same gene product but it is not a prerequisite for activity. 
15 The N-terminal and C-terminal parts may also originate from 
different gene products as long as they originate from the same 
polypeptide family e.g. the same glycosyl hydrolases. 

A method to link the unknown gene sequence with the known 
sequences is to clone the PCR product into a known gene, 
20 encoding a polypeptide having the activity of interest, which 
have had the sequences between the conserved regions removed. 

Another method is merging the PCR product, the N-terminal 
part and the C-terminal part by SOE-PCR (splicing by overlap 
extension PCR) e.g. as shown in figure 1 and described in 
25 detail in Example 1. Other methods known in the art may also be 
used. 

In a second aspect the invention relates to a novel DNA 
sequence provided by the method of the invention and the 
polypeptide encoded by said novel DNA sequence. 

30 

MATERIALS AND METHODS 

Pulpzyme® is a xylanase derived from Bacillus sp. AC13, NCIMB 
No. 40482. and is described in WO 94/01532 from Novo Nordisk A/S 
AZCL Birch xylan (MegaZyme, Australia) . 

35 

Plasmids: 

The Aspergillus expression vector pHD414 is a derivative of 
the plasmid p775 (described in EP 238 023). The construction of 
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pHD414 is further described in WO 93/11249. 

The 43 kD EG V endoglucanase cDNA from H. insolens 
(disclosed in WO 91/17243) is cloned into pHD414 in such a way 
that the endoglucanase gene is transcribed from the TAKA-pro- 
5 moter. The resulting plasmid is named pCaHj418. 

Kits 

QIAquick PGR Purification Kit Protocol 

Taq deoxy terminal cycle sequencing kit (Perkin Elmer, USA) 
10 AmpliTaq Gold polymerase (Perkin-Elmer , USA) 

Micro-organisms 
Bacteria 

electromax DH10B E. coll cells (GIBCO BRL) 

15 

Fungal micro-organisms: 

Cylindrocarpon sp. : Isolated from marine sample, the 
Bahamas 

Classification: Ascomycota, Pyrenomycetes , Hypocreales 
20 unclassified 

Fusarlum anguloldes Sherbakoff IFO 4467 
Classification: Ascomycota, Pyrenomycetes , Hypocreales, Hypo- 
creaceae 

Gliocladlum catenulatum Gillman £ Abbott CBS 227.48 
25 Classification: Ascomycota, Pyrenomycetes, Hypocreales, Hypo- 
creaceae 

Humlcola nxgrescens Omvlk CBS 819.73 
Classification: Ascomycota, Pyrenomycetes, Sordarlales, (fam. 
unclassified) 
30 Trichotheclum roseum IFO 5372 

Plates 

LB-ampicillin plates: 10 g Bacto-tryptone , 5 g Bacto yeast 
extract, 10 g NaCI, in 1 litre water, 2% agar 0.1% AZCL Birch 
35 xylan, 50 microg/ml ampicillin. 

Equipment 
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Applied Biosystems 3 7 3A automated sequencer 

PCR Amplification 

All Polymerase Chain Reactions is carried out under stan- 
5 dard conditions as recommended by Perkin-Elmer using AmpliTaq 
Gold polymerase. 

Isolation of Environmental DNA 

DNA is isolated from an environmental sample using FastDNA® 
10 SPIN Kit for Soil according to the manufacturers instructions. 

Methods used in Example 3 

Strains and growth conditions 

The fungal strains listed above, were streaked on PDA 

15 plates containing 0,5 % Avicel, and examined under a microscope 
to avoid obvious mistakes and contaminations. The strains were 
cultivated in shake flasks (125 rpm and 26 °C) containing 30ml 
PD medium (to initiate the growth) and 150ml of BA growth 
medium for cellulase induction. 

20 The production of cellulases in culture supernatants 

(typically after 3, 5, 7 and 9 days of growth) was assayed u- 
sing 0.1 % AZCl-HE-cellulose in a plate assay at pH 3, pH 7 and 
pH 10. The mycelia were harvested and stored at - 80°C. 

25 Preparation of RNase-free glassware, tips and solutions 

All glassware used in RNA isolations were baked at + 250 °C 
for at least 12 hours. Eppendorf tubes, pipet tips and plastic 
columns were treated in 0.1 % diethylpyrocarbonate (DEPC) in 
EtOH for 12 hours, and autoclaved. All buffers and water 

30 (except Tris-containing buffers) were treated with 0.1 % DEPC 
for 12 hours at 37 °C, and autoclaved. 

Extraction of total RNA 

The total RNA was prepared by extraction with guanidinium 
3 5 thiocyanate followed by ultracentrif ugation through a 5.7 M 
CsCl cushion [Chirgwin, (1979) Biochemistry 18 , 5294-5299] 
using the following modifications. The frozen mycelia was 
ground in liquid N2 to fine powder with a mortar and a pestle, 
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followed by grinding in a precooled coffee mill, and immedi- 
ately suspended in 5 vols of RNA extraction buffer (4 M GuSCN, 
0.5 % Na-laurylsarcosine, 25 mM Na-citrate, pH 7.0, 0.1 M B- 
mercaptoethanol) . The mixture was stirred for 3 0 min. at RT° 
5 and centrifuged (20 min. , 10 000 rpm, Beckraan) to pellet the 
cell debris. The supernatant was collected, carefully layered 
onto a 5.7 M CsCl cushion (5.7 M CsCl, 0.1 M EDTA, pH 7.5, 0.1 
% DERC; autoclaved prior to use) using 26.5 ml supernatant per 
12.0 ml CsCl cushion, and centrifuged to obtain the total RNA 

10 (Beckman, SW 28 rotor, 25 000 rpm, RT° , 24h) . After centrifuga- 
tion the supernatant was carefully removed and the bottom of 
the tube containing the RNA pellet was cut off and rinsed with 
70 % EtOH. The total RNA pellet was transferred into an Eppen- 
dorf tube, suspended in 500 fxl TE, pH 7.6 (if difficult, heat 

15 occasionally for 5 min at 65 °C) , phenol extracted and precipi- 
tated with ethanol for 12 h at -20 °C (2.5 vols EtOH, 0.1 vol 3M 
NaAc, pH 5.2). The RNA was collected by centrif ugation, washed 
in 70 % EtOH, and resuspended in a minimum volume of DEPC-DIW. 
The RNA concentration was determined by measuring OD 260/280. 

20 

Isolation of polv(A)+RNA 

The poly (A) + RNAs were isolated by oligo(dT) -cellulose af- 
finity chromatography [Aviv, (1972), Proc. Natl. Acad. Sci. 
U.S.A. 69, 1408-1412]. Typically, 0.2 g of oligo(dT) cellulose 

25 (Boehringer Mannheim, Germany) was preswollen in 10 ml of 1 x 
column loading buffer (20 mM Tris-Cl, pH 7.6, 0.5 M NaCl , 1 mM 
EDTA, 0.1 % SDS) , loaded onto a DEPC-treated, plugged plastic 
column (Poly Prep Chromatography Column, Bio Rad) , and equili- 
brated with 20 ml 1 x loading buffer. The total RNA (1-2 mg) 

3 0 was heated at 65 °C for 8 min., quenched on ice for 5 min, and 
after addition of 1 vol 2 x column loading buffer to the RNA 
sample loaded onto the column. The eluate was collected and 
reloaded 2-3 times by heating the sample as above and quenching 
on ice prior to each loading. The oligo(dT) column was washed 

35 with 10 vols of l x loading buffer, then with 3 vols of medium 
salt buffer (20 mM Tris-Cl, pH 7.6, 0.1 M NaCl, 1 mM EDTA, 0.1 
% SDS), followed by elution of the poly(A)+ RNA with 3 vols of 
elution buffer (10 mM Tris-Cl, pH 7.6, 1 mM EDTA, 0.05% SDS) 
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preheated to + 65 °C, by collecting 500 fxl fractions. The OD260 
was read for each collected fraction, and the mRNA containing 
fractions were pooled and ethanol precipitated at -20 °C for 12 
h- The poly (A) + RNA was collected by centrif ugation, resu- 
5 spended in DEPC-DIW and stored in 5-10 /xg aliguots at -80 °C. 

cDNA synthesis 

First strand synthesis 

Double-stranded cDNA was synthesized from 5 nq of poly (A) + 

10 RNA by the RNase H method (Gubler et al. (1983) Gene 25, 263- 
269; Sarabrook et al.(l989), Molecular Cloning: A Laboratory 
Manual, 2 Ed. f Cold Spring Harbor Laboratory, Cold Spring Har- 
bor, New York) using the hair-pin modification. The poly(A)+RNA 
(5 fxg in 5 Ml of DEPC-treated water) was heated at 70°C for 8 

15 min. in a pre-siliconized , RNase-free Eppendorph tube, quenched 
on ice, and combined in a final volume of 50 /il with reverse 
transcriptase buffer (50 mM Tris-cl, pH 8.3, 75 mM KC1, 3 mM 
MgC12, 10 mM DTT, Bethesda Research Laboratories) containing 1 
mM of dATP , dGTP and dTTP, and 0.5 mM of 5-methyl-dCTP 

20 (Pharmacia) , 40 units of human placental ribonuclease inhibitor 
(RNasin, Promega) , 1.45 of oligo(dT)18- Not I primer (Phar- 
macia) and 1000 units of Superscript II RNase H- reverse 
transcriptase (Bethesda Research Laboratories) . First-strand 
cDNA was synthesized by incubating the reaction mixture at 45 

25 °C for 1 h. After synthesis, the mRNA: cDNA hybrid mixture was 
gel filtrated through a MicroSpin S-400 HR (Pharmacia) spin 
column according to the manufacturer's instructions. 

Second strand synthesis 

30 After the gel filtration, the hybrids were diluted in 250 

Ml of second strand buffer (20 mM Tris-Cl, pH 7.4, 90 mM KC1, 
4.6 mM MgCl2, 10 mM (NH4)2S04, 0 . 16 mM BNAD+) containing 200 jiM 
of each dNTP, 60 units of E. aoli DNA polymerase I (Pharmacia), 
5.25 units of RNase H (Promega) and 15 units of E. coli DNA li- 

35 gase (Boehringer Mannheim) . Second strand cDNA synthesis was 
performed by incubating the reaction tube at 16 °C for 2 h, and 
an additional 15 min at 25 °C. The reaction was stopped by addi- 
tion of EDTA to 20 mM final concentration followed by phenol 



SUBSTITUTE SHEET (RULE 26) 



WO 97/43409 



PCT/DK97/00216 



21 

and chloroform extractions. 
Hung bean nuclease treatment 

The double-stranded (ds) cDNA was ethanol precipitated at - 
20 °C for 12 hours by addition of 2 vols of 9 6% EtOH, 0.2 vol 10 
5 M NH4Ac, recovered by centrifugation, washed in 70% EtOH, dried 
(SpeedVac) , and resuspended in 30 m! of Mung bean nuclease 
buffer (30 raM NaAc, pH 4.6, 300 mM NaCl, 1 mM 2nS04 , 0.35 mM 
DTT, 2 % glycerol) containing 25 units of Mung bean nuclease 
(Pharmacia) . The single-stranded hair-pin DNA was clipped by 
10 incubating the reaction at 3 0°C for 30 min, followed by addi- 
tion of 70 M i io mM Tris-cl, pH 7.5, 1 mM EDTA, phenol 
extraction, and ethanol precipitation with 2 vols of 96% EtOH 
and 0.1 vol 3M NaAc, pH 5.2 on ice for 30 min. 

15 Blunt-endina with T4 DNA polymerase 

The ds cDNAs were recovered by centrifugation (20 000 rpm, 
3 0 min.), and blunt-ended with T4 DNA polymerase in 3 0 Ml of T4 
DNA polymerase buffer (20 raM Tris-acetate , pH 7.9, 10 mM MgAc, 
50 mM KAc, 1 mM DTT) containing 0.5 mM each dNTP and 5 units of 

20 T4 DNA polymerase (New England Biolabs) by incubating the re- 
action mixture at +16 °C for 1 hour. The reaction was stopped by 
addition of EDTA to 20 mM final concentration, followed by 
phenol and chloroform extractions and ethanol precipitation for 
12 h at -20°C by adding 2 vols of 96% EtOH and 0.1 vol of 3M 

25 NaAc , pH 5.2. 

Adaptor ligation. Not I digesti on and size selection 

After the fill-in reaction the cDNAs were recovered by 
centrifugation as above, washed in 70% EtOH, and the DNA pellet 

30 was dried in SpeedVac. The cDNA pellet was resuspended in 25 /il 
of ligation buffer (30 mM Tris-Cl, pH 7.8, 10 mM MgC12 , 10 mM 
DTT, 0.5 mM ATP) containing 2.5 /ig non-palindromic BstXI adap- 
tors (1 M9/M1/ Invitrogen) and 30 units of T4 ligase (Promega) 
by incubating the reaction mix at +I6«c for 12 h. The reaction 

35 was stopped by heating at + 65°C for 20 min, and then on ice 
for 5 min. The adapted cDNA was digested with Not I restriction 
enzyme by addition of 20 /il autoclaved water, 5 jul of 10 x Not 
I restriction enzyme buffer (New England Biolabs) and 50 units 
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of Not I (New England Biolabs) , followed by incubation for 2.5 
hours at +37 °C. The reaction was stopped by heating the sample 
at +65 °C for 10 min. The cDNAs were size-fractionated by 
agarose gel electrophoresis on a 0.8% SeaPlaque GTG low melting 
5 temperature agarose gel (FMC) in 1 x TBE (in autoclaved water) 
to separate unligated adaptors and small cDNAs. The gel was run 
for 12 hours at 15 V, the cDNA was size-selected with a cut-off 
at 0.7 kb by cutting out the lower part of the agarose gel, and 
the cDNA was concentrated by running the gel backwards until it 

10 appeared as a compressed band on the gel. The cDNA (in agarose) 
was cut out from the gel, and the agarose was melted at 65 °C in 
a 2 ml Biopure Eppendorph tube (Eppendorph) . The sample was 
treated with agarase by adding 0.1 vol of 10 x agarase buffer 
(New England Biolabs) and 2 units per 100 pi molten agarose to 

15 the sample, followed by incubation at 45°C for 1.5 h. The cDNA 
sample was phenol and chloroform extracted, and precipitated by 
addition of 2 vols of 9 6 % EtOH and 0.1 vol of 3M NaAc, pH 5.2 
at - 20°C for 12 h. 

20 EXAMPLES 
Example 1 

Providing novel DNA s equences encoding polypeptide with 

xvlanase activity 

Novel sequences with xylanase activity were provided ac- 
25 cording to the method of the invention using the glycosyl 
hydrolase family 11 xylanase derived from Bacillus sp. (SEQ ID 
No 1) as the known structural gene sequence. 

Identificati on of conserved regions bv alignment 
30 An amino acid sequence alignment of ten family 11 xylanases 

revealed at least 3 conserved sequences. Two of these conserved 
sequences are used to design appropriate PCR primers for 
amplification of unknown DNA sequences. 

The first conserved sequence shown in SEQ ID No, 3 i.e. 
35 "DGGTYDIY" corresponding to position 433-456 in SEQ ID NO 1. 

The second conserved sequence shown in SEQ 4, i.e. 
"EGYQSSG" corresponding to position 631-651 in SEQ ID NO 1. 
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PCR amplification of the known and unknown partial structural 
gene sequences 

Initially the N-terminal end (i.e. Part A) and the C- 
terminal (i.e. Part C) of the known xylanase gene, in which the 
5 unknown sequence (i.e. Part B) is to be inserted, were 
amplified by PCR (see figure 4) 

Part A was PCR amplified using the two primers (i.e. primer 
e and primer a rc ) and as DNA template a plasmid carrying the 
known xylanase gene (i.e. SEQ ID NO l) . 
10 Primer e (shown in SEQ ID NO 5 and figure 1) is an exact N- 

terminal primer extended with a sequence which included an 
EcoRI restriction recognition site. 

Primer a rc (shown in SEQ ID NO 6 and figure 1) is a reverse 
and complement sequence primer of position 411-432 in SEQ ID NO 
15 1. 

Part C was PCR amplified using the two primers (i.e. primer 
f and primer d rc ) mentioned below and as DNA template a plasmid 
carrying the known xylanase gene. 

Primer f is an exact reverse and complement C-terminal pri- 
20 mer extended with a sequence which having a Sail restriction 
recognition site is shown in SEQ ID No. 7. 

Primer d rc (SEQ ID NO 8) was designed on the basis of posi- 
tion 651-672 in SEQ ID No. 1. 

Part B was PCR amplified using two primers (i.e. primer ab 
25 and primer cd) and as DNA template DNA purified from a soil 
sample using the FastDNA® SPIN Kit. 

Primer ab (SEQ ID NO 9) has the exact sequence of position 
411-432 in SEQ ID 1 extended with degenerated xylanase 
consensus sequence covering position 433-4 52 in SEQ ID NO 1 
30 Primer cd (SEQ ID NO: 10) has the exact reverse and 

complement sequence of position 672-651 in SEQ ID NO 1 extended 
with degenerated xylanase consensus sequence covering position 
650-631 in SEQ ID NO 1. 

The N-terminal part of the known xylanase gene (Part A) was 
35 PCR amplified for 9 min. at 94°C followed by 30 cycles (45 
second at 94 °C, 45 seconds at 50"C and 1 min. at 72 °C) and 
finally for 7 min. at 72 °C. This gave a PCR product of approx. 
450 bp. 



SUBSTITUTE SHEET (RULE 26) 



WO 97/43409 



PCT/DK97/00216 



24 

The C-terminal part (Part C) of the known xylanase gene was 
PCR amplified for 9 min. at 94 °C followed by 3 0 cycles (45 se- 
conds at 94°C, 45 seconds at 50°C and 1 min. at 72°C) and fi- 
nally for 7 min. at 72 °C. This gave a PCR product of approx. 
5 100 bp. 

The unknown sequences (Part B) was PCR amplified for 9 min. 
at 94 °C followed by 4 0 cycles (45 seconds at 94°C / 4 5 seconds at 
50°C and 1 min. at 72°C) and finally for 7 rain, at 72 °C. This 
gave a PCR product of approx. 260 bp. 
10 The PCR products mentioned above were carefully purify to 

avoid remains of template DNA which can produce false positive 
bands in the following SOE-PCR where the products are joined 
together to form hybrid sequences. 

15 Construction of hybrid sequences 

Hybrid sequences containing the N- and C-terminal parts of 
the known xylanase gene with core part of unknown genes was 
constructed by splicing by overlap extension PCR (SOE-PCR) . 

Equal molar amounts of Part A, Part B and Part C PCR pro- 

2 0 ducts were mixed and PCR amplified under standard conditions 

except that the reaction was started without any primers. 

The reaction started with 9 min. at 94 °C followed by 4 
cycles (45 seconds at 94°C, 45 seconds at 50°C, 1 min. at 
72°C), then primers e and f (SEQ ID No. 5 and 7, respectively) 
25 were added, followed by 25 cycles (45 seconds at 94°C, 45 
seconds at 50°C, 1 min. at 72°C) and finally 7 min. at 72°c. 
This gave a SOE-PCR product of the expected size of approx. 770 
bp. 

3 0 Cloning of the hybrids 

The SOE-PCR product was purified using the QIAquick PCR 
Purification Kit Protocol and digested overnight with EcoRI and 
Sail according to the manufacturers recommendation. The 
digested product was then ligated into an E . coli expression 
35 vector overnight at 16 °C (in this case a vector where the 
hybrid gene is under control of a temperature sensitive lamda 
repressor promoter) . 
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The ligation mixture was transformed into electromax DH10B 
E. coli cells (GIBCO BRL) and plated on LB-ampicillin plates 
containing 0.1% AZCL Birch xylan. After induction of the pro- 
moter (by increasing the temperature to 42 °C) xylanase positive 
5 colonies were identified as colonies surrounded by a blue halo. 

Plasmid DNA was isolated from positive E. coll colonies using 
standard procedures and sequenced with the Taq deoxy terminal 
cycle sequencing kit (Perkin Elmer, USA) using an Applied 
Biosystems 3 73 A automated sequencer according to the manufac- 
10 turers instructions. 

The sequence of a positive clone is shown in SEQ ID NO li 
and the corresponding protein sequence is shown in SEQ ID NO 
12. 

An alignment of the known xylanase sequence (SEQ ID NO 2) 
15 and the novel DNA sequence provided according to the method of 
the invention can be seen in Figure 3. As can be seen the two 
protein sequences differs between the two identified conserved 
regions (i.e. SEQ ID NO 3 and SEQ ID NO 4, respectively). 

20 Example 2 

Efficiency of the me thod of the invention 

Degenerated primers were designed on the basis of conserved 
regions identified by alignment of a number of family 5 cellu- 
lases and family 10 and 11 xylanases found on the Internet in 
25 ExPASy under Prosite (Dictionary of protein sites and 
patterns) . 

PCR amplification of a number of unknown structural gene 
sequences from soil and cow rumen samples were performed with 
various degenerated primers covering identified conserved re- 
3 0 gion sequences to show how effective the method of the inven- 
tion is. 

The PCR products were cloned into the vector pCR tm II, pro- 
vided with the original TA cloning kit from Invitrogen. Said 
vector provides the possibility to make blue-white screening, 
3 5 the white colonies were selected and the inserts were sequen- 
ced. 

When editing the Sequence Listing below all sequences out- 
side the two EcoRI sites in the poly linker were removed. 
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Therefore all sequences have a small additional part of the 
polylinker (i.e. from the EcoRI site to the TT overhang) in 
both ends of the sequences. These extensions are "GAATTCGGCT" 
and "AAGCCG". 

5 l. PGR primers were designed on the basis of identified 

conserved regions #1 GWNLGN and #2 (E/D) HLIFE of cellulases 
from the glycosyl hydrolase family 5 aiming to provide novel 
sequences with cellulase activity. 

SEQ ID NO 13 and 14 show the sequences obtained from a soil 
10 sample. SEQ ID NO 15 and 16 show the sequences obtained from a 
cow rumen sample. 

2. PCR primers were designed on the basis of identified 
conserved regions #1 GWNLGN and #3 RA (S/T) GGNN of cellulases 
from the glycosyl hydrolase family 5 aiming to provide novel 

15 sequences with cellulase activity. 

SEQ ID NO 17 to 19 show the sequences obtained from a cow rumen 
sample. 

3. PCR primers were designed on the basis of identified 
conserved regions #2 (E/D) HLIFE and #3 RA( S/T) GGNN of cellula- 

20 ses from the glycosyl hydrolase family 5 aiming to provide no- 
vel sequences with cellulase activity. 

SEQ ID NO 2 0 to 22 show the sequences obtained from a cow rumen 
sample. 

4. PCR primers were designed on the basis of identified 
25 conserved regions #4 HTLVWH and #5 WDWNE of xylanases from the 

glycosyl hydrolase family 10 aiming to provide novel sequences 
with xylanase activity. 

SEQ ID NO 23 to 28 show the sequences obtained from a cow rumen 
sample. 

3 0 5. PCR primers were designed on the basis of the identified 

conserved regions #4 HTLVWH and #6 (F/Y) (1/ Y) NDYN of xylanases 
from the glycosyl hydrolase family 10 aiming to provide novel 
sequences with xylanase activity. 

SEQ ID NO 29 to 3 3 show the sequences obtained from a cow rumen 
3 5 sample. 

6. PCR primers were designed on the basis of the identified 
conserved regions #5 WDWNE and #6 (F/Y) (I/Y) NDYN of xylanases 
from the glycosyl hydrolase family 10 aiming to provide novel 
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sequences with xylanase activity. 

SEQ ID NO 3 4 to 3 6 show the sequences obtained from a soil 
sample. SEQ ID NO 37 to 45 show the sequences obtained from a 
cow rumen sample 

5 7. PCR primers were designed on the basis of the identified 

conserved regions #8 DGGTYDIY and #9 EGYQSSG of xylanases from 
the glycosyl hydrolase family 11 aiming to provide novel 
sequences with xylanase activity. 

SEQ ID NO 46 to 49 show the sequences obtained from a soil 
10 sample. SEQ ID NO 50 to 54 show the sequences obtained from a 

cow rumen sample. 

60 clones with inserts were sequenced and resulted in 43 

different sequences all encoding either a part of a cellulase 

or a part of a xylanase. Only 2 of the 4 3 sequences were 
15 similar to sequence found in the sequence databases Genbank. 

SEQ ID NO 49 was found to be similar to Xylanase A from 

Bacillus pumilus. SEQ ID NO 42 was found to be similar to a 

xylanase from Prevot&lla ruminicola, 

20 Example 3 

Construction of novel hy brid DNA sequences encoding 

polypeptid es with endoalucanase activity 

Novel hybrid DNA sequences with endoglucanase activity were 
provided by first identifying two conserved regions common for 
25 the following family 45 cellulases (see WO 96/29397) : Humicola 
insolens EGV (disclosed in WO 91/17243), Fusariujn oxysporum EGV 
(Sheppard et al., Gene (1994), Vol. 15, pp. 163-167), Thielavia 
terrestris, Myceliophthora thermophila, and Acremonium sp 
(disclosed in WO 96/29397) . 

3 0 The amino acid sequence alignment revealed two conserved 

region. 

The first conserved region "Thr Arg Tyr Trp Asp Cys Cys Lys 
Pro/Thr" shown in SEQ ID NO 57 corresponds to position 6 to 14 
of SEQ ID NO 55 showing the Humicola insolens EG V 43 KDa 
3 5 endoglucanase . 

The second conserved region ,f Trp Arg Phe/Tyr Asp Trp Phe" 
shown in SEQ ID NO 58 corresponding to positions 169 to 198 of 
SEQ ID NO 55 showing the Humicola insolens EGV 4 3 KDa 
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endoglucanase . 

Two degenerate, deoxyinosine-containing oligonucleotide 
primers (sense; primer s and antisense; primer as) were con- 
structed) for PCR amplification of unknown gene sequences. The 
5 deoxyinosines are depicted by an I in the primer sequences. 

Primers s and primer as are shown in SEQ ID No. 59 and 60 
respectively . 

The Humlcola insolens EG V structural gene sequence (SEQ ID 
NO 55) was used as the known DNA sequence. A number of fungal 
10 DNA sequences mentioned below were used as the unknown 
sequences. 

PCR cloning of the family 45 cellulase core region and the 

linker/CBD of Humicol a insolens EG V. 

15 Approximately 10 to 20 ng of double-stranded, cellulase-in- 

duced cDNA from Humicola nigrescens, Cylindrocarpon sp., Fusa- 
rium anguioides, Gliocladium catenulatum, and Trichothecium ro- 
SBum prepared, as described above in the Material and Methods 
section were, PCR amplified in Expand buffer (Boehringer Mann- 

20 heim, Germany) containing 200 MM each dNTP and 200 pmol of each 
degenerate Primer s (SEQ ID NO 59) and Primer as (SEQ ID NO 60) 
a DNA thermal cycler (Perkin-Elmer , Cetus, USA) and 2.6 units 
of Expand High Fidelity polymerase (Boehringer Mannheim, Germa- 
ny) . 30 cycles of PCR were performed using a cycle profile of 

25 denaturation at 94 °C for 1 min, annealing at 55 °C for 2 min, 
and extension at 72 °C for 3 min, followed by extension at 72 °C 
for 5 min. 

The PCR fragment coding for the linker/CBD of H. insolens 
EGV was generated in Expand buffer (Boehringer Mannheim, Ger- 
30 many) containing 200 }M each dNTP using 100 ng of the pCaHj418 
template, 2 00 pmol forward primer 1 (SEQ ID NO 61) , 2 00 pmol 
reverse primer 1 (SEQ ID NO 62) . 3 0 cycles of PCR were 
performed as above. 

3 5 Construction of hybrid genes using splicing bv overlap 
extension (SOE) 

The PCR products were electrophoresed in 0,7 % agarose gels 
(SeaKem, FMC) , the fragments of interest were excised from the 
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gel and recovered by Qiagen gel extraction kit (Qiagen, USA) 
according to the manufacturer's instructions. The recombinant 
hybrid genes were generated by combining the overlapping PCR 
fragments from above (ca. 50 ng of each template) in Expand 
5 buffer (Boehringer Mannheim, Germany) containing 200 /xM each 
dNTP in the SOE reaction. Two cycles of PCR were performed 
using a cycle profile of denaturation at 94 °C for 1 min, 
annealing at 50 C for 2 min, and extension at 72 °C for 3 min, 
the reaction was stopped, 250 pmol of each end-primer: forward 

10 primer 2 (SEQ ID NO 63) encoding the TAKA-amylase signal se- 
quence from A . oryzae, reverse primer 2 (SEQ ID NO 64) was ad- 
ded to the reaction mixture, and an additional 3 0 cycles of PCR 
were performed using a cycle profile of denaturation at 94 °C 
for 1 min, annealing at 55 °c for 2 min, and extension at 72 °C 

15 for 3 min. 

ponstrijction of the ex pression cassettes and heterologous 

expression in Aspergillus orvzae 

The PCR-generated, recombinant fragments were electropho- 

20 resed in 0.7 % agarose gels (SeaKem, FMC) , the fragments were 
excised from the gel and recovered by Qiagen gel extraction kit 
(Qiagen, USA) according to the manufacturer's instructions. The 
DNA fragments were digested to completion with BamHI and Xbal, 
and ligated into BamHI /Xbal-cleaved pHD414 vector, co-transfor- 

25 niation of A. oryzae was carried out as described in Chris tensen 
et al. (1988), Bio/Technology 6, 1419-1422. The AmdS+ transfor- 
mants were screened for cellulase activity using 0.1 % AZC1-HE- 
cellulose in a plate assay as described above. The cellulase- 
producing transf ormants were purified twice through conidial 

30 spores, cultivated in 250 ml shake flasks, and the amount of 
secreted cellulase was estimated by SDS-PAGE, Western blot 
analysis and the activity assay as described earlier (Kauppinen 
et al. (1995), J. Biol. Chem. 270, 27172-27178;; Kofod et al. 
(1994), J. Biol. Chem. 269, 29182-29189; Christgau et. 

35 al,(1994), Biochem. Mol. Biol. Int. 33, 917 - 925). 

Nucleotide sequence analysis 
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The nucleotide sequences of the novel hybrid gene fusions 
were determined from both strands by the dideoxy chain- 
termination method (Sanger et al. , (1977), Proc. Natl. Acad. 
Sci. U.S.A. 74, 5463-5467) , using 500 ng template, the Taq 
5 deoxy- terminal cycle sequencing kit (Perkin-Elmer , USA), 
fluorescent labeled terminators and 5 pmol of synthetic 
oligonucleotide primers. Analysis of the sequence data was 
performed according to Devereux et al., 1984 (Devereux et al., 
(1984), Nucleic Acids Res. 12, 387-395). 

10 The provided novel hybrid DNS sequences an the deduced 

protein sequences are shown in SEQ ID NO 65 to 74. 

SEQ ID NO 65 shows the hybrid gene construct comprising the 
family 45 cellulase core region from Humicola nigrescens and 
the linker /CBD of Humicola Insolens EG V. SEQ. ID No 66 shows 

15 the deduced amino acid sequence of the hybrid gene construct. 

SEQ ID NO 67 shows the hybrid gene construct comprising the 
family 45 cellulase core region from Cylindrocarpon sp. and the 
linker /CBD of Humicola insolens EG V. SEQ ID NO 68 shown the 
deduced amino acid sequence of the hybrid gene construct. 

2 0 SEQ ID NO shows the hybrid gene construct comprising the 

family 4 5 cellulase core region from Fusarium anguioides and 
the linker /CBD of Humicola insolens EG V. SEQ ID NO 7 0 shows 
the deduced amino acid sequence of the hybrid gene construct. 
SEQ ID NO 71 shows the hybrid gene construct comprising the 

25 family 4 5 cellulase core region from Gliocladium catenulatum 
and the linker/CBD of Humicola insolens EG V. SEQ ID NO 72 
shows the deduced amino acid sequence of the hybrid gene 
construct. 

SEQ ID NO 7 3 shows the novel gene construct comprising the 
30 family 45 cellulase core region from Trichothecium roseum and 
the linker/CBD of Humicola insolens EG V. SEQ ID NO 74 shows 
the deduced amino acid sequence of the hybrid gene construct. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 
(i) APPLICANT: 
5 (A) NAME: Novo Nordisk A/S 

(B) STREET: Novo Alle 

(C) CITY: Bagsvaerd 
<E) COUNTRY: Denmark 

(F) POSTAL CODE (ZIP): DK-2B80 
10 <G) TELEPHONE: +45 4444 8886 

(H) TELEFAX: +45 4449 3256 
<ii> TITLE OF INVENTION: Method for providing novel DNA sequences 
(iii) NUMBER OF SEQUENCES: 74 4 
(iv) COMPUTER READABLE FORM: 
15 (A) MEDIUM TYPE: Floppy diak 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS— DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO> 

20 (2) INFORMATION FOR SEQ ID NO: 1: 
(i) SEQUENCE CHARACTERISTICS: 

<A> LENGTH: 747 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
<vi) ORIGINAL SOURCE: 

(B) STRAIN: Bacillus sp. AC13, NCIMB No. 404B2 
( ix ) FEATURE : 
3 0 (A) NAME/KEY: CDS 

(B) LOCATION: 1. .747 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

ATG AGA CAA AAG AAA TTG ACG TTC ATT TTA GCC TTT TTA GTT TGT TTT 48 
35 Met Arg Gin Lys Lys Leu Thr Phe lie Leu Ala Phe Leu Val Cys Phe 
15 10 15 

GCA CTA ACC TTA CCT CCA GAA ATA ATT CAG GCA CAA ATC GTC ACC GAC 96 
Ala Leu Thr Leu Pro Ala Glu He He Gin Ala Gin He Val Thr Asp 
40 20 25 30 



AAT TCC ATT GGC AAC CAC GAT GGC TAT GAT TAT GAA TTT TGG AAA GAT 

Ann Co f- T 1 a fl\ u i ~ * ~. m _ 

45 



™* rw*w vnw v*rtA lfli uai TAT GAA TTT TGG AAA GAT 

Asn Ser lie Gly Asn His Asp Gly Tyr Asp Tyr Glu Phe Trp Lys Asp 
35 40 45 



144 



AGC GGT GGC TCT GGG ACA ATG ATT CTC AAT CAT GGC GOT ACG TTC ACT 192 
Ser Gly Gly Ser Gly Thr Met He Leu Asn His Cly Gly Thr Phe Ser 
50 55 60 



50 GCC CAA TGG AAC AAT GTT AAC AAC ATA TTA TTC CGT AAA GGT AAA AAA 240 
Ala Gin Trp Asn Asn Val Asn Asn He Leu Phe Arg Lys Gly Lys Lys 
65 70 75 80 



TTC AAT GAA ACA CAA ACA CAC CAA CAA GTT CGT AAC ATG TCC ATA AAC 288 
55 Phe Asn Glu Thr Gin Thr His Gin Gin Val Gly Asn Met Ser He Asn 

85 90 95 

TAT GGC GCA AAC TTC CAG CCA AAC GGA AAT GCG TAT TTA TGC GTC TAT 336 
Tyr Gly Ala Asn Phe Gin Pro Asn Gly Asn Ala Tyr Leu Cys Val Tyr 
60 100 105 iio 

GGT TGG ACT GTT GAC CCT CTT GTC GAA TAT TAT ATT GTC GAT AGT TGG 384 
Gly Trp Thr Val Asp Pro Leu Val Glu Tyr Tyr He Val Asp Ser Trp 
65 115 120 125 

GGC AAC TGG CGT CCA CCA GGG GCA ACG CCT AAG GGA ACC ATC ACT GTT 432 
Gly Asn Trp Arg Pro Pro Gly Ala Thr Pro Lys Gly Thr He Thr Val 
130 135 140 
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GAT GGA GGA ACA TAT GAT ATC TAT GAA ACT CTT AGA GTC AAT CAG CCC 480 
Asp Gly Gly Thr Tyr Asp He Tyr Glu Thr Leu Arg Val Asn Gin Pro 
145 150 155 160 

5 TCC ATT AAG GGG ATT GCC ACA TTT AAA CAA TAT TGG AGT GTC CGA AGA 528 
Ser Ho Lys Gly He Ala Thr Phe Lys Gin Tyr Trp Ser Val Arg Arq 
165 170 17S 

TCG AAA CGC ACG AGT GGC ACA ATT TCT GTC AGC AAC CAC TTT AGA GCG 576 
10 Ser Lys Arg Thr Ser Gly Thr He Ser Val Ser Asn His Phe Arg Ala 
180 185 190 

TGG GAA AAC TTA GGG ATG AAC ATG GGG AAA ATG TAT GAA GTC GCG CTT 624 
Trp Glu Asn Leu Gly Met Asn Met Gly Lys Met Tyr Glu Val Ala Leu 
15 195 20O 205 

ACT GTA GAA GGC TAT CAA AGT AGC GGA AGT GCT AAT GTA TAT AGC AAT 672 

Thr Val Glu Gly Tyr Gin Ser Ser Gly Ser Ala Asn Val Tyr Ser Asn 

210 215 220 

20 

ACA CTA AGA ATT AAC GGT AAC CCT CTC TCA ACT ATT AGT AAT GAC AAG 720 

Thr Leu Arg lie Asn Gly Asn Pro Leu Ser Thr He Ser Asn Asp Lys 

225 230 235 240 

25 AGC ATA ACT CTA GAT AAA AAC AAT TAA 747 
Ser He Thr Leu Asp Lys Asn Asn * 
245 



30 (2) INFORMATION FOR SEQ ID NO: 2: 

(i> SEQUENCE CHARACTERISTICS: 
<A> LENGTH: 249 amino acids 
<B) TYPE: amino acid 
<D) TOPOLOGY: linear 
3 5 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Arg Gin Lys Lys Leu Thr Phe He Leu Ala Phe Leu Val Cys Phe 
1 5 10 15 

40 

Ala Leu Thr Leu Pro Ala Glu lie He Gin Ala Gin He Val Thr Asp 
20 25 30 

Asn Ser He Gly Asn His Asp Gly Tyr Asp Tyr Glu Phe Trp Lys Asp 
45 35 40 45 

Ser Gly Gly Ser Gly Thr Met He Leu Asn His Gly Gly Thr Phe Ser 
50 55 60 

50 Ala Gin Trp Asn Asn Val Asn Asn He Leu Phe Arg Lys Gly Lys Lys 
65 70 75 80 

Phe Asn Glu Thr Gin Thr His Gin Gin Val Gly Asn Met Ser He Asn 
85 90 95 

55 

Tyr Gly Ala Asn Phe Gin Pro Asn Gly Asn Ala Tyr Leu Cys Val Tyr 
100 105 HO 

Gly Trp Thr Val Aap Pro Leu Val Glu Tyr Tyr He Val Asp Ser Trp 
60 115 120 125 

Gly Asn Trp Arg Pro Pro Gly Ala Thr Pro Lys Gly Thr He Thr Val 
130 135 140 

65 Asp Gly Gly Thr Tyr Asp lie Tyr Glu Thr Leu Arg Val Asn Gin Pro 
145 150 155 160 

Ser He Lys Gly He Ala Thr Phe Lys Gin Tyr Trp Ser Val Arg Arg 
165 170 175 
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Ser Lys Arg Thr Ser Gly Thr lie Ser Val Ser Asn His Phe Arg Ala 
180 185 190 

5 Trp Glu Asn Leu Gly Met Aen Met Gly Lys Met Tyr Glu Val Ala Leu 
195 200 205 

Thr Val Glu Gly Tyr Gin Ser Ser Gly Ser Ala Asn Val Tyr Ser Asn 
210 215 220 

10 

Thr Leu Arg lie Aon Gly Asn Pro Leu Ser Thr lie Ser Asn Asp Lvs 
22S 230 235 240 

Ser lie Thr Leu Asp Lys Asn Asn * 
15 245 



(2) INFORMATION FOR SEQ ID NO: 3: 
(i> SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: B amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
25 (A) DESCRIPTION: /desc = "Conserved region" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Asp Gly Gly Thr Tyr Asp He Tyr 



(2) INFORMATION FOR SEQ ID NO: 4: 
(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
<D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
40 <A) DESCRIPTION: /desc = "Conserved region" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



45 



Glu Gly Tyr Gin Ser Ser Gly 
1 5 



(2) INFORMATION FOR SEQ ID NO: 5: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 
50 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: other nucleic acid 

<A) DESCRIPTION: /desc = "Primer t 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

55 



GCGAATTCAT GAGACAAAAG AAATTGACG 29 

(2) INFORMATION FOR SEQ ID NO: 6: 
(i) SEQUENCE CHARACTERISTICS: 
60 (A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
65 (A) DESCRIPTION: /desc = "Primer a rc M 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
AACAGTGATG GTTCCCTTAG GC 22 
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(2) INFORMATION FOR SEQ ID NO: 7: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 31 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY : linear 

<ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer f - 
10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CTAGAGTCGA CTTAATTGTT TTTATCTAGA G 31 

15 (2) INFORMATION FOR SEQ ID NO: 8: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDS DNS5S; single 
20 (D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer d rc » 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

25 AACAGTGATG GTTCCCTTAG GC 22 

(2) INFORMATION FOR SEQ ID NO: 9: 
(i) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 
< D ) TOPOLOGY : 1 i near 

(ii) MOLECULE TYPE: other nucleic acid 
35 (A) DESCRIPTION: /desc = "Primer ab • 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GCCTAAGGGA ACCATCACTG TTGAYGGXGG XACXTAYGAY AT 42 

40 (Y=C or T, X= 25% A and 75% Inosin) 

(2) INFORMATION FOR SEQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS: 
45 (A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
50 (A) DESCRIPTION: /desc » M Primer cd " 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

AATGCTATAT ACATTAGCAC TTCCXSWXSW YTGGTAXCCY TC 42 

55 (S=G or C, W=A or T, Y=C or T, X« 25% A and 75% Inosin) 

(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 
60 (A) LENGTH: 747 base pairs 

(B) TYPE: nucleic acid 
<C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: hybrid DNA 
65 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .747 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
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ATG AGA CAA AAG AAA TTG ACG TTC ATT TTA GCC TTT TTA GTT TGT TTT 46 
Met Arg Gin Lys Lye Leu Thr Phe lie Lea Ala Phe Leu Val Cys Phe 
15 10 is 

5 GCA CTA ACC TTA CCT GCA GAA ATA ATT CAG GCA CAA ATC GTC ACC GAC 96 
Ala Leu Thr Leu Pro Ala Glu lie lie Gin Ala Gin lie Val Thr Aap 
20 25 30 

AAT TCC ATT GGC AAC CAC GAT GGC TAT GAT TAT GAA TTT TGG AAA GAT 144 
10 Asn Ser lie Gly Asn Hie Asp Gly Tyr Asp Tyr Glu Phe Trp Lys Asp 
35 40 45 

AGC GGT GGC TCT GGG ACA ATG ATT CTC AAT CAT GGC GGT ACG TTC AGT 192 
Ser Gly Gly Ser Gly Thr Met lie Leu Asn His Gly Gly Thr Phe Ser 
15 50 55 60 



20 



GCC CAA TGG AAC AAT GTT AAC AAC ATA TTA TTC CGT AAA GGT AAA AAA 240 
Ala Gin Trp Asn Asn Val Asn Asn He Leu Phe Arg Lys Gly Lys Lys 
65 70 75 ao 

TTC AAT GAA ACA CAA ACA CAC CAA CAA GTT GGT AAC ATG TCC ATA AAC 288 
Phe Asn Glu Thr Gin Thr His Gin Gin Val Gly Asn Met Ser He Asn 
B5 90 95 

25 TAT GGC GCA AAC TTC CAG CCA AAC GGA AAT GCG TAT TTA TGC GTC TAT 336 
Tyr Gly Ala Asn Phe Gin Pro Asn Gly Asn Ala Tyr Leu Cys Val Tyr 
100 lOS no 

GGT TGG ACT GTT GAC CCT CTT GTC GAA TAT TAT ATT GTC GAT AGT TGG 384 
30 Gly Trp Thr Val Asp Pro Leu Val Glu Tyr Tyr He Val Asp Ser Trp 
115 120 125 

GGC AAC TGG CGT CCA CCA GGG GCA ACG CCT AAG GGA ACC ATC ACT GTT 432 
Gly Asn Trp Arg Pro Pro Gly Ala Thr Pro Lys Gly Thr He Thr Val 
35 130 135 140 



40 



GAC GGG GGG ACG TAT GAT ATC TAC AAG CAC CAA CAG GTC AAT CAG CCA 480 
Asp Gly Gly Thr Tyr Asp He Tyr Lys His Gin Gin Val Asn Gin Pro 
145 150 155 160 

TCT ATT CAG GGC ACC GCC ACC TTC AAT CAG TAC TGG TCG ATT CGA CAG 528 
Ser He Gin Gly Thr Ala Thr Phe Asn Gin Tyr Trp Ser He Arg Gin 
165 170 175 

45 AGC AAG CGG ACC AGC GGC ACT GTC ACT ACG GCA AAC CAC TTT AAT GCC 576 
Ser Lys Arg Thr Ser Gly Thr Val Thr Thr Ala Asn His Phe Asn Ala 
180 185 190 

TGG GCT GCT CTT GGC ATG AAT ATG GGT GCA TTC AAT TAC CAG ATC CTC 624 
50 Trp Ala Ala Leu Gly Met Asn Met Gly Ala Phe Asn Tyr Gin He Leu 
195 200 205 

GTT ACT GAG GGC TAC CAA TCT ACC GGA AGT GCT AAT GTA TAT AGC AAT 672 
Val Thr Glu Gly Tyr Gin Ser Thr Gly Ser Ala Asn Val Tyr Ser Asn 
55 210 215 220 

ACA CTA AGA ATT AAC GGT AAC CCT CTC TCA ACT ATT AGT AAT GAC AAG 720 
Thr Leu Arg He Asn Gly Asn Pro Leu Ser Thr He Ser Aen Asp Lys 
225 230 235 240 



60 



65 



AGC ATA ACT CTA GAT AAA AAC AAT TAA 747 
Ser He Thr Leu Asp Lys Asn Asn * 
245 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 249 amino acids 

(B) TYPE: amino acid 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

5 Met Arg Gin Lys Lys Leu Thr Phe lie Leu Ala Phe Leu Val Cys Phe 
15 10 15 

Ala Leu Thr Leu Pro Ala Glu He He Gin Ala Gin He Val Thr Asp 
20 25 30 

10 

Aan Ser He Gly Asn His Asp Gly Tyr Asp Tyr Glu Phe Trp Lys Asp 
35 40 45 

Ser Gly Gly Ser Gly Thr Met He Leu Asn His Gly Gly Thr Phe Ser 
15 50 55 60 

Ala Gin Trp Asn Asn Val Asn Asn He Leu Phe Arg Lys Gly Lys Lys 
65 70 75 80 

20 Phe Asn Glu Thr Gin Thr His Gin Gin Val Gly Asn Met Ser He Asn 

85 90 95 

Tyr Gly Ala Asn Phe Gin Pro Asn Gly Asn Ala Tyr Leu Cys Val Tyr 
100 105 110 

25 

Gly Trp Thr Val Asp Pro Leu Val Glu Tyr Tyr He Val Asp Ser Trp 
115 120 125 

Gly Asn Trp Arg Pro Pro Gly Ala Thr Pro Lys Gly Thr He Thr Val 
30 130 135 140 

Asp Gly Gly Thr Tyr Asp He Tyr Lys His Gin Gin Val Asn Gin Pro 
145 150 155 160 

3 5 Ser He Gin Gly Thr Ala Thr Phe Asn Gin Tyr Trp Ser He Arg Gin 
165 170 175 

Ser Lys Arg Thr Ser Gly Thr Val Thr Thr Ala Asn His Phe Asn Ala 
180 185 190 

40 

Trp Ala Ala Leu Gly Met Asn Met Gly Ala Phe Asn Tyr Gin He Leu 
195 200 205 

Val Thr Glu Gly Tyr Gin Ser Thr Gly Ser Ala Asn Val Tyr Ser ABn 
45 210 215 220 

Thr Leu Arg He Asn Gly Asn Pro Leu Ser Thr He Ser Asn Asp Lys 
225 230 235 240 

50 Ser He Thr Leu Asp Lys Asn Asn * 
245 

(2) INFORMATION FOR SEQ ID NO: 13: 
55 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 409 base pairs 
{ B) TYPE: nucleic acid 
<C) STRAND ED NESS : single 
(D) TOPOLOGY: linear 
60 (ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: NS1/9 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 



GAATTCGGCT TGGGTGGAAT CTGGGGAACA 
65 GGCCGTCCGT GAGCGCCTAC GAGACCGCCT 
TCGACGGCAT CAAAGCGTCC GGCTTCAACT 
TGATGGGCCC GGACTATACC ATTAACCCGG 
TTACGGTCTG GCCGACAACA TGTATGTCAT 
ACTAAATTCC CACCAACTAC GACGAAAGCA 



CGTTGGATGC TACCGGAGAC TGG AT CAAAG 60 

GGGGCAATCC CGTCACCACC AAGGCTATGT 120 

TTGTTCGCAT TCCCGTGGCG TGG T CCAACA 180 

CGTTG ATGGC GAGAGTCGAG AAGTGGTGAA 240 

GATCAACATC CACTGGGACG CGGCTGGATC 300 

TGAAGAAGTA TAAGGCGGTC TGGAGCCAGA 360 
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TCGCCGACCA TTTCAAAGCT ACTCCGACCA CCTCATCTTC GAAAAGCCG 409 



(2) INFORMATION FOR SEQ ID NO: 14: 
5 (i) SEQUENCE CHARACTERISTICS J 

(A) LENGTH: 408 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: NS1/12 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



AATTCGGCTT GGGTGGAATC TGGGGAACAC 
15 TTCCGTGCGC GATTTCGAGA CGGCTTGGGG 
CGGCGTCAAG GCGGCCGGCT TCAGGTCCAT 
GGGACCTAAG CCCGACTACA CTATCAATAA 
CCGGTACGGC CTCGACAACG ACATGTACGT 
ATCCACCCCT TGTCGACCGA CTACAACGAA 
20 CCAGGTAGCC GACCATTTCA AGGGCTACTC 



TCTGGAAGCC TGCGGCGGGA T CAAATG C AG 60 

CAACCCCGTC ACGACCAAGG CCATGATCGA 120 

ACGCATCCCC GTCG CCTGGT CGAACCTGAT 180 

GAAGCTGATG GCACGAGTCG AGCAGGTCGC 240 

CATCATCAAC ATTCACTGGG ACGCGGCTGG 300 

ATGCAX-GARG AATTACAAGG CGGTGTGGGG 360 

CGACCACCTC ATCTTCGA 408 



(2) INFORMATION FOR SEQ ID NO: 15: 
(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 416 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 

3 0 (vi) SCIENTIFIC NAME: KN1/9 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

AATTCGGCTT CTCGAAGATG AGGTGGTCGG AGTAGCCTTT GAAATGGTCG GCGATCTGGC 60 

TCCAGACCGC CTTATACTTC TTCATGCTTT CGTCGTAGTT GGTGGGGAAT TTAGTGATCC 120 

35 AGCCGCCGTC CCAGTGGATG TTGATCATGA CATACATGTT GTCGGCCAGA CCGTAATTCA 180 

CCACTTCCTC GACTCTCGCC ATCAACGCCG GGTTAATGGT ATAGTCCGGG CCCATCATGT 240 

TCGACCACGC CACGGGAATG CGAACAAAGT TGAAGCCGGA CGCTTTGATG CCGTCGAACA 300 

TAGCCTTGGT GGTGACGGGA TTGCCCCAGG CGGTCTCGTA GGCGCTCACG GACGGCCCTT 360 

GATCCAGTC TCCGGTAGCA TCCAACGTGT TCCCCARATT CCACCCAACC CGAATT 416 

4 0 



(2) INFORMATION FOR SEQ ID NO: 16: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 base pairs 
45 <B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linaar 
(ii) MOLECULE TYPE: Hybrid DNA 
(vi) SCIENTIFIC NAME: KMl/2 

50 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

AATTCGGCTT GTTCCGCAAG CGTCAAAGGG GATGTGATGT ACCAGATCAA GGCAAAGCTC 60 

GGTCTGAAAT AAAACTAGTC AAAACTAGCC AAAACTAGTC AGGCTAGTCA GAACCAGTTA 120 

GCACAATCGT AAAAACTAAA AGTATGAGCG ACGGCAATTT CAACCGCGCC CTCCTGCCGA 180 

55 AGAACGAACT CTCTGCAGGA CTCAGGGCTG GCAAAG C AC A GATGCGCACC AAGG C TG AAA 240 

C AG G CGTTGG AGACTGTACT CGACNAATAC TTCCCCTCTG CCGACATCTC GCTCCGAAAC 300 

GCAATCCACG AACGATCCTC CAACTCTT A C AACAG TAGG A CAAAGGTGAA ACGTATTTAA 360 

TTATGCTTCC TGAATTNTCA TTAACACNAT GCCTGTGTGG CACCCATCCG CGTNTTCAAT 420 

GGTGTTCACC AGGGCATCCT TTACTCATCC CACAGGTTAA GCAANTGGCC AAANAACACC 480 

6 0 GTCCGGCTTC 4 9 0 

(2) INFORMATION FOR SEQ ID NO: 17: 
(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 492 base pairs 
65 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 
(vi) SCIENTIFIC NAME: KN2/2 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 17: 

AATTCGGCTT GTTGTTGCCG CCGCTGGTGC GGACCACGTC AATAAAAGTC TCCTTGTAAG 60 

AATTCTGCAC AGCCAGATTC TCAGG CTCGG GCTTGCCCCA GTTATCGCGC AGGTCAACCT 120 

5 CGTTAG TACC AGCAAAGGCT ACGCGGTAGT CGTAGTTGGC AAACTCGCTG GCGATATTCA 180 

GCCACAGCAG GGCGAGTTTC TGGTTGTTCT CGTCCTTGTA CTGATAGGTA GGACRACCCT 240 

CCAGCCACTT GTCGTGATGC GTATTGATGA TGACTTTTAG GTCATTCTCG AAGCACCARC 300 

CCACAACCTC TTTGATACGT GCCAGCCAAG CCTTGTCAAT GCTCATGGCA ACGGGATTGG 360 

TGATGTTGCA CTGCCACCGG AMSGGAATGC GGATGGCGTT RAAAC; TGCA TCCTTGACTG 420 

10 CCTTGATAAC TTTTTTGTTA CAACGGGATT GCCCCATGCC GTCTCACCCT TAATACTGTT 480 

CTCATACATC CG A a 3 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



(2) INFORMATION FOR SEQ ID NO; 18; 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 574 base pairs 

(B) TYPE: nucleic acid 

(C) ST RAND EDN ESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 
(vi) SCIENTIFIC NAME: KM2/5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 



18: 



AATTCGGCTT GTTGTTGCCG CCGGTGGTAC GGATGGTGTT CACCACCAAC TGGTTCCACT 60 
CGTTGAGGGT TTTATACTGC TTACCGCCAT CGGTACGGTT TGCGCCCCAT CCCCAGCCGC 
CGTCCTGAAT CTCGTTGAAC GACTCGAATA TGAGGAATTC GCCCTTGTCC TTGAAGGCTT 
CGGCAATCTG TTTCCANGTT TTCTCAATAC GG TTCTTG AT GTTGCTGTTG GTCGTTGAAT 
TGTTGGCAGC GCCCTTAATG TCAACCAGTA CTCATCGTGA TGCATGTTCA GGATNACNTT 300 
CAGTCCGGCA CTTCGGCCCA CTCCACATTC TGCCTGACTT CTGCTATGTA TTTAGCATCT 
ATCCCCATTC CAAATGTTTC TGGTANTTGC CCATGTTACC CGANACTTAN GTGCTGGCAC 
AACGTTTTTA NGTTTCTTAA AAACCGCAAA GGCTTGGCAT TTCCAATATC CCANTGGGGA 
ACCNAACNTC NCACCCNGCC GGTACAAATG GTNCCCCNTT TCCCCCAACC CAAATCCNCC 
NCNGGGGGCC GTTACNATTG NATCNAACCG GTAC 574 



(2) INFORMATION FOR SEQ ID NO: 19: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 520 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 
(vi) SCIENTIFIC NAME: KM2/6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 



19: 



GTTGTTGCCG 
GTTGTAGGCC 
ACACCAGGAG 
GCAGTCCCGG 



CCGGTCGTTC 
GATGTGGCTA 
CTCAAGGGAT 
AATTCCTGTG 



TCACGGTGGT 
TGGCTTCGTT 
CCAGCATCTC 
CTATCTGCTG 



TG AG CAT AN C 
G TAG CG GC AA 
TCGAAGAGCA 
TCATANCGGG 



AATTCGGCTT 
TGTTGATGGC 
AGGATGCGAA 

AGCGCTGTCC „ 

AGCGGTTCAN CGCGTATTTG TCCTCGGANG CCTTGATCCA CNACTTGAAA CNANTTGCTg" 300 

TCTGCGCCCG 

ATCACCACTT 
TTCTTATACC 
AACAANGGGT 



TGTCGTGGTG AACGTTGAAT NATGCAGTAC 
CATGCACGCG GGCCATCCAC GCCNCATCCA 
ACTTCATCGC CCACGGATGG CACCAAACCC 
GGTGGGATAT TAACCCAACA GGTCCGAAGA 



GACGAAGCTC 
GTACCTGCCG 
GTTGAAGCTC 
CCAC AG ACGT 



AAGCCCTGGT 
CNTTGCCGGC 
GGATCTTTNT 



CTAGGANACT 
GCTGTCCATN 
CNTCCTGAAN 
520 



(2) INFORMATION FOR SEQ ID NO: 20: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 194 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 
(vi) SCIENTIFIC NAME: KM3/2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

AATTCGGCTT GAGCACCTGA TTTTTGAGGG CTACAACGAG ATGCTCGACA AGTATGACTC 
CTGGTGTTTT GCCACCTTCG GACGCTCGGC AGGCTATAAC GCTACAGACG CCGCCGATGC 
CTATAAAGCC ATCAACAACT ATGCCCAGAG CTTCG TCAAC GCCGTACGCA CCACCGGCGG 



120 
180 
240 

360 
420 
480 
540 



60 
120 
180 
240 

360 
420 
480 



60 
120 
180 
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CAACAACAAG CCG 294 

(2) IN FORMAT I OK FOR SEQ ID NO: 21: 
5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 160 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: KM3/8 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

AATTCGGCTT GAGCACTTGA TTTTCGAGGC CTACAACGAG ATGCTCGATG CCCAGAGCTC 60 
15 GTGGAACTTT GCCCAGACCA GCACAGCCTA TGATGCTATC AACAACTATG CCCAAAGCTT 120 
CGTCAACATT GTTCGTACCA GCGGCGGCAA CAACAAGCCG 160 

\ *- / *iiiwj\iuinww run jay xu nu: i 
2 0 <i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 193 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 (ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: KM3/9 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

AATTCGGCTT GAGCATTTGA TCTTCGAGAG TTACAACCAG ATGCTCGATA CGGAAGATTC 60 

30 CTGGTGCTTC GCCTCGTTTG CAGCGCAGGG CAGTTACAAT GCCACCATCG CGCGTTCGGC 120 

CTACAACGGC ATTAATAGCT ATGCGCAGAC TTTCGTCAAC ACCGTACGTA CCACCGGCGG 180 
CAACAACAAG CCG 193 

(2) INFORMATION FOR SEQ ID NO: 23: 
35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 166 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: KM4/1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

AATTCGGCTT CAYACGCTGG TGTGGCACTC TCAGATCGGT CGTTGGATGA CTGCCGAGGG 60 
45 TACAACCAAG GAGCAGTTCT ATCCTCGTAT GAAGAACCAT ATCCAGGCTA TCGTTACTCG 120 
TTACAAGGAT GTGGTGTACT GCTGGGACGT CGTCAACGAG AAGCCG 166 

(2) INFORMATION FOR SEQ ID NO: 24: 
50 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 178 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

55 (ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: KM4/2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

AATTCGGCTT CTCGTTAACG A CGTCCCAGG CATCGATCTT ACCG CAG AAA TGGCCGGCTA 60 
60 CCGTCTCTAT GTAACTGCGC ATGGTCTCAA CCATCTCATC GTGGCTCTTG GGAGTGCCGT 120 
CAGCGTGGTT GAAAAAGAAA TCGGGAGTCT CATTGTGCCA C ACCAGCG T A TGAAGCCG 178 

(2) INFORMATION FOR SEQ ID NO: 25: 
65 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 181 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Hybrid DNA 
<vi) SCIENTIFIC NAME: KM4/4 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

5 AATTCCCCTT CAYACGCTGG TGTGGCACTC GCAGGCACCC GACTGGTGGT TTACCAACGG 60 
CTATGCTGCC AGCCCTGTCT CAAAGGAAGT GCTGAAAGAG CGGCTCATCA AGCATATTAA 120 
GACCGTTGTT GGCCATTTCA AGGGCCAAGT CTTTGGCTGG GACGTCGTCA ACGARAAGCC 180 
G 181 



(2) INFORMATION FOR SEQ ID NO: 26: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 199 base pairs 
15 ( B ) TYPE: nucleic acid 

(C) STRANDED NESS: single 

(D) TOPOLOGY: linear 
(ii) M OLECUL E TYPE: Hybrid DNA 
<vi) SCIENTIFIC NAME: KM4/7 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



AATTCGGCTT CATACGTTGG TGTGGCACAA 

CTACAACGAG AACCTGCCTC TGGCGGACCG 

TATCCGCGGT GTGCTGACCT ATGTGCAGGA 
25 CGTCGTCAAC GAGAAGCCG 



TCAGACGCCG GCCTGGTTCT TCCGCAGGGG 60 

CGAGACCATG CTGGCGAGGC TGGAGAGCTA 120 

GAATTATCCC GGGATCGTCT ACGCCTGGGA 180 

199 



(2) INFORMATION FOR SEQ ID NO: 27: 
(i) SEQUENCE CHARACTERISTICS: 
30 <A) LENGTH: 185 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 

35 (vi) SCIENTIFIC NAME: KM4/8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO i 27: 



AATTCGGCTT GGCACGGACA GACGCCGCAG TGGTTCTTCT ACGAGAACTA TAATACTTCA 60 
GGAAAACTTG CAAGCAGGGA AACGATGCTG GCAAGAATGG GAAACTATAT TAANGGCGTG 120 
4 0 CTTGGCTTCG TGCAGGACAA TTATCCCGGC GTCATCTATG CGTGGGACGT TG TCAACG AG 180 
AACCG !85 



(2) INFORMATION FOR SEQ ID NO: 28: 
45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 208 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
<D> TOPOLOGY: linear 

50 (ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: KM4/9 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 



ATCTGCAGAA ATTCGGCTTC TCGTTAACGA 

55 CACTCTGGAT AAAACCAAGC ACACCCTTTA 

CTCTGTCGGT ATAGGGAAAT GACTCGTTAT 

GATTGTGCCA CACCAGCGTA TGAAGCCG 



CGTCCCATGC ATAGATGACA CCCGGATATT 60 

TATAATTTTC AAGTCTGGCA AGCATGGTCT 120 

AGTGCTCACA GAAAAACCAC TTCGGTGTCT 180 

208 



60 (2) INFORMATION FOR SEQ ID NO: 29: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 310 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 
65 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Hybrid DNA 
(vi) SCIENTIFIC NAME: KK5/1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
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AATTCGGCTT GTTGTAGTCG TTGTAGTACA GCTTGCAGTT TGAAGGAGCG TACTTTCTTG 60 
^^J?TGAA CGCTTTCTCA ATAAATGCGT TGCTGCCGTA AACCTGTACC CAAGGGANAA 120 
GCGCCGTTGC CGTACCCGGA ACTCTTGCTC CGCCGTTGTT ACGTGTTCTG TTGGAGTCAC 180 
ANAAAATACA CTCGTTGCAG ACATCTAAAG CTTAAAGGTT AATCCGGGAT ACTGTGACTG 240 
5 £2£? CCCGAA CATATC TTGA AGTTACCTTC CAGTCCNGGT CCATACGGAA TGCTACCAGC 300 
TTCGCCGTCC 

(2) INFORMATION FOR SEQ ID NO: 30: 
10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 384 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: KM5/2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

AATTCGGCTT GTTGTANTNG TTGWWGAAGA NGTGGCAGNT TGCCSSTcrc rmTr^nnn an 
20 S2SSJ£ ^? CTTTOC * ATORAGCTGT TgVcAC^TA ^CCWCACC CACGGGGACT iSS 
SJSSS™ G J AA " CGGC TCAC GG6CGC CGCCTGCACC ACGCGTACGC GCATCGCTGT 180 
CCGAGATACA CTCGTTGCAG ACGTCGTARG CGTANARCTT CAGCGTCNGA TAGTTGTTCT 240 
S^^TGC AAMCATATTG TCAATGTANC YCTTGANGCG CTGGTTCATG ACAGTGGANT 30O 
25 AN^S C GC « GAMGm TCCTTGAAAN AACCAGANCG GARTCTGGRA 360 

(2) INFORMATION FOR SEQ ID NO: 31: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 354 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 
(vi) SCIENTIFIC NAME: KM5/4 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

^SlZ^ll CATACGTTGG TGTGGCACAA TCAGACGCCC GTATGGTTTT TTAAGGAAAA 60 

^^S^^I GACTGGAACG CGCCTGCCGC CCCCAAAGAA ATCCTGCTCG CCCGCCTGGA 120 

AAACTATATC CGGGATGTCA TGCGGCATGT GAATACCTGT TTCCCCGGTG TGGTCTACAC 180 

40 ^^ CGAAG CC * TCG **<* GGGGCAGGGC GGTCCCGGCC 1*0 
CCGCAATCCC TGGT TTGCTT TCACAGGCCA NGATTTCCTG CCGGCTGCCT TCCGGGCCCC 300 
CGCGAAAACN AAGTCCOGGG ACAGAACCTG TGCTACAACG ACTACAACAA GCCG 354 

45 (2) INFORMATION FOR SEQ ID NO: 32: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 374 base pairB 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
50 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: KM5/5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

55 AATTCGGCTT CAT ACG CTGG TGTCGCACAG CCAGACTCCT GACTGGTTCT TCAAGGAGAA 60 
CTTCACCTCA AACGGTCAGC TCG TATCAAA GGATATAATG ^TCAGCTTA TCGAAAACTA 120 
CATCAAGAAC GTATTCACAA TGCTCAATGC AG AGTATCC T ACAGTTCACT TCTATGCTTA 180 
CGATGTAGCT AACGAGTGTA TGGCTGACAG CAGAAACGGC GGTCTCAGAC OCGTO™0« 240 

fiO ^55*™** CCCCATGG ™ TCTTATCTAC GGCGACM^ GCTA^Sa S3S 

ACTACAACAA c^g 0 ™ 0 ™* AGAAATTA ™ CTCCTGCTGG CTGCNAACTT TTCTTCAACG 360 

374 

(2) INFORMATION FOR SEQ ID NO: 33: 
65 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 376 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOG Y: linear 



SUBSTITUTE SHEET (RULE 26) 



WO 97/43409 PCT/DK97/00216 



42 

(ii) MOLECULE TJfPEs Hybrid DNA 

(vi) SCIENTIFIC NAME: KM5/6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 • 



CATACGCTGG TGTGGCACAG CCAGACTCCC GAGTGGTTCT TCAAGGAGGA fin 
5"CGACGAG AAGAAGGATT ACGTTTCTCC CGAAAAGATG AMaVgVgtA iSm llo 
CATCAAGAGC TTCTTCACAA CACTTACAGA GCTCTATCCC GAOSTTGACT TCTATGCMG 1 ftn 
™S^ A ******** GGACAGACGA CGGAAAGCCC COTGAGGCAG W(Sc 2« 
ACAGTCCAAC AACTACGGCG CTTCCGACTG GGTTGCTGTA TTCGGCGACA A^CATTCAT ■ 
10 CGA^C J^ GTAT ° CAAGAAAGTA TGCTCCCGA^C^ 360 

(2) INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 
I 5 <A) LENGTH: 166 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Hybrid DNA 
20 (vi) SCIENTIFIC NAME: NS6/3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

y™ CCTT TGGGATGTGG TGAACGAGGC CTTCAACGAA GACGGTTCAC GGCGCAGCCA SO 
^TTTCCAG AATGTGCTCG GCAACGGCTA TATCGAGCAG GcTt^AGGA COTTCTOTGC if 2 
2 5 GGCTGACCCC AATGCCAAAC TGTGCTACAA CGACTACAAC AAGCCG CCGCGCGTGC 120 

(2) INFORMATION FOR SEQ ID NO: 35: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 151 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 
(vi) SCIENTIFIC NAME: NS6/5 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

AATTCGGCTT GTTGTAGTCG TTGTTGAACA GGCGGGTGGT TGGGTCTACC TC ATC AC PA ft fin 

40 lbi 
(2) INFORMATION FOR SEQ ID NO: 36: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 166 base pairs 
<B) TYPE : nucleic acid 
45 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 
(vi) SCIENTIFIC NAME: NS6/13 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

50 

AATTCGGCTT GTTGTAGTCG TTGTAGCACA GTTTGGCATT GGGATCTGTA ACCCGTGCAG 6ft 
CTTTGAATGC CTCTTCAATA TAG CT AT TG C CAATCAGCCG TTCCAAGATT ^J^»TO C 12Q 
GTGAGCCATT CTCTTCGAAG GCCTCATTCA CCACATCCCA W^GCCG GAGGCACGCC^ 120 

55 (2) INFORMATION FOR SEQ ID NO: 37: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 250 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
60 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: NS6A/1 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

65 AATTCGGCTT GTTGTAGTCG TTGWTGMAGA GTTTTACATC TTTTGGACCA TATTTrrTAr fin 

CCAGACGACA GGCCTGACGG ACGTAGTCGA TATCACCC^G A^AG^OTCC C^TAG^T 12^ 

GtJgSS?^ r^l^l GTGGCATCTG GATTACCATT AGGA?tSaC ^AG^T llo 

CATCCC^M GTAGTTGCCT TGTCCGTCAT CACCACCACC AGAGATCGCC TCRTTCACCA 240 
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(2) INFORMATION FOR SEQ ID NO: 38: 
(i) SEQUENCE CHARACTERISTICS: 
5 <A) LENGTH: 247 baae pairs 

<B) TYPE: nucleic acid 
<C) STRANDED NESS ; single 
(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 
10 (vi) SCIENTIFIC NAME: KM6A/4 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

l^GGAYGTGG TGAAYGAGGC GATAGAGCTT AACGA C AAG A CCGAAACCGG 60 
ACTTCGTAAT TCATACTGGT ATCAAATAAT CGGTGACGAT TTCATATATT ACGCATTTCG 120 
15 * GACGCAAGAG AGGAACTGTG CGTTAAATAT GCGGCCGAGT ACGGCATTGA 180 

^TTCGGAC AAAGAAGCGC TTAAAGCCAT CCGCCCCGCT TTCTGCAACA ACGACTACAA 240 
CAAGCCG 247 

(2) INFORMATION FOR SEQ ID NO: 39: 
20 <i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 238 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

25 (ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: KM6A/5 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

in ^ll^Zll TGGGATGTGG TGAACGAGGC TATCTCGGGT GGCCACAGTG ACGG CGACGG 60 
30 CTCC AG CAT T CCGAGGGCTA TAAGAACGGC ACTTGGGATG T AGO CGGCG A 120 

E^"!"™ * GGCAGG ACT ACATGGGCGA CCTGGATTAC GTRCGTCAGG CTTGCCGACT 180 
GGCCCGCAAA TACGGCCCTG AGGATGTGAA GCTYTKCATC AACGACTACA ACAAGCCG 238 

(2) INFORMATION FOR SEQ ID NO: 40: 
35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 226 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: KM6A/7 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

AATTCGGCTT GTTGTAGTCG TTGATGCACA ACAGGGCATT GGGGTCGGCC TCACGGGCAA 60 

5 ™S ^« CGTCG C OGCAGAGTTT GTAATGACGA ^TCTCACC^T 120 

AGGGGCTGGG AGCCTGACCT GGACGGCCTC CGAAACCGCC AAAGCCACCA AAGCCACCAA ISO 

AGCCGCCACC GTCGGAAATG G C CTCGTTC A CTACATCCCA^AGCOT AAGCCACCAA 180 



50 (2) INFORMATION FOR SEQ ID NO: 41: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 205 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
55 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Hybrid DNA 

(Vi) SCIENTIFIC NAME: KM6B/1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

60 SSSES SSS^S ^S5SS TXSSSS: 'ESSSS, ,55 

3SS55 £g22?J& a ™ — «3 S^SSS SSSSi III 

65 

(2) INFORMATION FOR SEQ ID NO: 42: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 235 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 
(vi) SCIENTIFIC NAME: KM6B/2 

5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

AATTCGGCTT GTTGTAGTCG TTGATGAAGA GCTTCATATC CTGTGGACCA TACTTGCGAG 60 

CCAGCTTAAC GGCAGTACGA ACATAGTCGA TATCGCCCAG ATAATCCTGC CAGAAGAAGC 120 

TCTCGGTTGC AGCCTTTTCT GGATCTTCCT GATCCTTCAG GTGCTGCAAA GCATATACGC 180 
10 CCTCAGCATC GGCATGTCCC CTTGAGAGTG CCTCGTTCAC CACATCCCAA AGCCG 235 

(2) INFORMATION FOR SEQ ID NO: 43: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 244 base pairs 
15 (B) TYPE: nucleic acid 

<C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 
(vi) SCIENTIFIC NAME: KM6B/3 
20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

AATTCGGCTT GTTGTAGTCG TTGATGAANA GTTTCAAGTC TTCCGGGTTG CCCTTGAAGT 60 

GCTTGCGCGC ACTCTTAACC GCCGTACGCA CGTATTCGAN GTCGCCCATA TCGTCCTGCC 120 

AAAAGAANAG CCATTCTGCA CTGAAGTCGG GTCGGTGTTG CGGCTACTGT TGTGCTGAAN 180 

2 5 GGGATAATTG CCCTGCCCAT CGTTGCCGCC GCCAGANATA CCTCGTTCAC ACGTCCCAAA 240 

GCCG 244 

(2) INFORMATION FOR SEQ ID NO: 44: 
<i> SEQUENCE CHARACTERISTICS: 

3 0 (A) LENGTH: 212 base pairs 

<B) TYPE: nucleic acid 

<C> STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 
35 (vi) SCIENTIFIC NAME: KM6B/4 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

AAATTCGGCT TGTTGTAGTC GTTGATGTAC AGGACCGGGG CTTTG CCGT A CTTGGCGCAA 60 
GCCTCTGTTG CATAGGCGAA TG C AG CATC A ACCCAGTCTT TGGTGCTCGG GTAATAATTG 120 

4 0 CCCCAGACAA AGTCGTTGCC AGATGCTCCC TGGG TGCGG A ATGCCCCGCC GGCACCGTCT 180 

GCAAAGGTCT CGTTCACCAC GTC CCAAAGC CG 212 

(2) INFORMATION FOR SEQ ID NO: 45: 
(i) SEQUENCE CHARACTERISTICS: 
45 <A) LENGTH: 190 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 

50 (vi) SCIENTIFIC NAME: KM6B/5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

AATTCGGCTT GTTGTAGTCG TTGTAGAACA GACCTGCATT AGGATCAGCC TCGTGAGCAA 60 

ACTGGAATGC CTTGAGGATG AACTCGTCAC CGCAGAGCTG ATAAGCGGTT GACTGACGGA 120 

55 ATGACTGCTC GTAAGGAACA TCGGGGTTGT TGCCGTCGCT CATTGCCTCG TTTACCACGT 180 
CCCAAAGCCG ^90 

(2) INFORMATION FOR SEQ ID NO: 46: 
(i) SEQUENCE CHARACTERISTICS: 
60 (A) LENGTH: 234 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 
(ii) MOLECULE TYPE: Hybrid DNA 

65 (vi) SCIENTIFIC NAME: NS8/1 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

AATTCGGCTT GACGGGGGGA CGTAYGAYAT CTACGAGACC ACCCGCTACA ACGAACCCTC 60 
CATCATCGGC ACCGCCACCT TCAACCAGTA CTGGAGCGTG CGCCAGTCCA GGCGCACCGG 120 
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CGGCACCATC ACCACCGGCA ACCACTTCGA CGCCTGGGCC AGCCACGGCA TGAACCTGGG 180 
CACCTTCAAC TACCAGATCC TGGCCACCGA RGGCTACCAA TSCTSCGGAA GCCG 234 



5 (2) INFORMATION FOR SEQ ID NO: 47: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 234 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 
10 (D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: Hybrid DNA 

<vi> SCIENTIFIC NAME: NS8/6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

15 AATTCGGCTT GACGGGGGRA CGTACGACAT CTACGAGCAC CAGCAAGTCA ACCAGCCCTC 60 

CATCCAAGGC ACTGCGACCT TCAACCAGTA CTGGTCCATC CGCCAGAGCA AGCGTTCCAG 120 

CGG CACTGTG ACCACTGCCA ACCACTTCAA TGCTTGGGCC AAGTTGGGAA TGAACCTGGG 180 
CAACTTCAAC TACCAGATTG TTTCCACTGA RGGCTACCAG WCCTSCGGAA GCCG 234 

20 (2) INFORMATION FOR SEQ ID NO: 48: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 234 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Hybrid DNA 

(vi) SCIENTIFIC NAME: NSB/11 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

3 0 AATTCGGCTT GACGGGGGGA CGTATGATAT CTACAAGCAC CAACAGGTCA ATCAGCCATC 60 

TATTCAGGGC ACCGCCACCT TCAATCAGTA CTGGTCGATT CGACAGAGCA AGCGGACCAG 120 

CGGCACTGTC ACTACGGCAA ACC ACT TT AA TGCCTGGGCT GCTCTTGGCA TGAATATGGG 180 
TGCATT CAAT TACCAGATCC TCGTTACTGA GGGCTACCAA TCTACCGGAA GCCG 234 

35 (2) INFORMATION FOR SEQ ID NO: 49: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 213 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
40 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE r Hybrid DNA 

(vi) SCIENTIFIC NAME: NS8/12 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

45 AATTCGGCTT GACGGGGGGA CGTACGACAT TTATGAAACA ACCCGTG TCA ATCAGCCTTC 60 
CATTATCGGG ATCGCAACCT TCAAG CAAT A TTGGAGTGTA CGTCAAACGA AACGTACAAG 120 
CGGAACGGTC TCCGTCAGTG CGCATTTTAG AAAATCGGAA AGCTTAGGGA TGCCAATGGG 180 
GAAAATGTAT GAAACGGCAT TTACTGTAAG CCG 213 

50 

(2) INFORMATION FOR SEQ ID NO: 50: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 196 base pairs 

(B) TYPE: nucleic acid 
55 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
<ii) MOLECULE TYPE: Hybrid DNA 
(vi) SCIENTIFIC NAME: KM8A/1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

60 

AATTCGGCTT TGGGACGTGG TGAATGAGGC AATGCCAGAC AATGTTCGTC CTAACCCGTG 60 
GAATC CCAAC CCCTCGCCCT ACCGTGACTC CCGCCACTAC AAATTGTGCG GCGACGAGTT 120 
CATCGCCAAG GCATTCCAAT TCGCAAGGGA AGCCGACCCG AAGGCACAAT TGTTCAACAA 180 
CGACTACAAC AAGCCG 196 

65 

(2) INFORMATION FOR SEQ ID NO: 51: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 211 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 
(O) TOPOLOGY: linear 

(ii) MOLECULE TYPE : Hybrid DNA 
5 (vi) SCIENTIFIC NAME: KM8A/3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

AATTCGGCTT GTTGTAGTCG TTGATGCACA GGACCGGGGC TTTGCCGTAC TTGGCGCAAG 60 

CCTCTGTTGC ATAGGCGAAT GCAGCATCAA CCCAGTCTTT GGTGCTCGGG TAATAATTGC 120 

10 CCCAAACAAA GTCGTTGGCA GATGCTCCCT GGGTGCGGAA TGCCCCGCCG GCACCGTCTG 180 
CAAAGGTCTC GTTCACCACG TCCCAAAGCC G 211 

(2) INFORMATION FOR SEQ ID NO: 52: 
(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 240 baae pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

{ ii) MOLECULE TYPE'S Hybrid DNA 
20 (vi) SCIENTIFIC NAME: KM8B/7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

AATTCGGCTT GACGGGGGGA CGTACGACAT CTACAAGACC ACCAGATACG AACAGCCCTC 60 
TATCGACGGC ACACAGACCT TCGACCAGTA CTGGAGCGTA AGACAGTCCA AGCCACAGGG 120 
2 5 CGAGGGCAAG AAGATAGAAG GTACTATCTC AGTGTCCAAG CACTTCGATG CGTGGAAAAA 180 
GTGCGGCCTT GAGCTCGGAA ATATGTATGA AGTANCTCTT ACTATCGAAG GGCTAAGCCG 240 

(2) INFORMATION FOR SEQ ID NO: 53: 
(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH : 229 baae pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 

35 (vi) SCIENTIFIC NAME: KM8A/9 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

AATTCCCGGA GGTTTGGCAG CCTTCAATAG TAAGAGCAGC TTCATACATT AATCCTAATT 60 

T CAT TCCTTT GCTTGTCCAA GCTTTGAAGT GGTCACTTAC AGAAATAGTT CCACTAGTTT 120 

40 TTTTTTCAGT TCTGACACTC CAG AATTGTT TAAATGTAGC AGTACCATCA ATTGAAGGTT 180 
GATTAATTCT GTCAGTGGTA TANATATCAT ACGTCCCCCC ATCAAGCCG 229 

(2) INFORMATION FOR SEQ ID NO: 54: 
(i) SEQUENCE CHARACTERISTICS: 
45 (A) LENGTH: 234 base paire 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Hybrid DNA 

50 (vi) SCIENTIFIC NAME: KM8B/10 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

AATTCGGCTT GACGGGGGGA CGTACGACAT ATACGAGACT ACTCGTTACA ACCAGCCTTC 60 

AATCGAAGGC AACACTACTT TCCAGCAGTA CTGGAGCGTT CGTACATCCA AGCGCACCAG 120 

55 CGGTACCATT TCCGTATCCG AGCACTTTAA GGCTTGGGAA CGCATGGGTA TGAGATGCGG 180 
AAACCTTTAT GAGACTGCTT TAACTGTTGA GGGCTACCAN ACCACCGGAA GCCG 234 

(2) INFORMATION FOR SEQ ID NO: 55 : 

(i) SEQUENCE CHARACTERISTICS: 

60 (A) LENGTH: 1060 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
65 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Humicola insolens 

(B) STRAIN: DSM 1800 
(ix) FEATURE: 
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(A) NAME/KEY: mat_peptide 

(B) LOCATION: 73.. 927 
( ix ) FEATURE : 

(A) NAME/KEY: sig_peptide 

(B) LOCATION: 10 ..72 
(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 10.. 927 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 



GGATCCAAG ATG CGT TCC TCC CCC CTC CTC CCG TCC GCC GTT GTG GCC 48 
Met Arg Ser Ser Pro Leu Leu Pro Ser Ala Val Val Ala 
-21 -20 -15 -10 

15 GCC CTG CCG GTG TTG GCC CTT GCC GCT GAT GGC AGG TCC ACC CGC TAC 96 
Ala Leu Pro Val Leu Ala Leu Ala Ala Asp Gly Arg Ser Thr Arg Tyr 
-5 15 

TGG GAC TGC TGC AAH CCT TCG TGC GGC TGG GCC AAG AAG GCT CCC GTG 144 
20 Trp Asp Cys Cys Lys Pro Ser Cys Gly Trp Ala Lys Lys Ala Pro Val 
10 15 20 

AAC CAG CCT GTC TTT TCC TGC AAC GCC AAC TTC CAG CGT ATC ACG GAC 192 
Aen Gin Pro Val Phe Ser Cys Asn Ala Asn Phe Gin Arg lie Thr Asp 
25 25 30 35 40 

TTC GAC GCC AAG TCC GGC TGC GAG CCG GGC GGT GTC GCC TAC TCG TGC 240 
Phe Asp Ala Lys Ser Gly Cys Glu Pro Gly Gly Val Ala Tyr Ser Cys 

30 " 50 55 

GCC GAC CAG ACC CCA TGG GCT GTG AAC GAC GAC TTC GCG CTC GGT TTT 288 
Ala Asp Gin Thr Pro Trp Ala Val Asn Asp Asp Phe Ala Leu Gly Phe 
60 65 70 

35 GCT GCC ACC TCT ATT GCC GGC AGC AAT GAG GCG GGC TGG TGC TGC GCC 336 
Ala Ala Thr Ser lie Ala Gly Ser Asn Glu Ala Gly Trp Cys Cys Ala 
75 80 85 

TGC TAC GAG CTC ACC TTC ACA TCC GGT CCT GTT GCT GGC AAG AAG ATG 384 
40 Cys Tyr Glu Leu Thr Phe Thr Ser Gly Pro Val Ala Gly Lys Lys Met 
90 95 100 

GTC GTC CAG TCC ACC AGC ACT GGC GGT GAT CTT GGC AGC AAC CAC TTC 432 
Val Val Gin Ser Thr Ser Thr Gly Gly Asp Leu Gly Ser Asn His Phe 
45 105 no us 120 

GAT CTC AAC ATC CCC GGC GGC GGC GTC GGC ATC TTC GAC GGA TGC ACT 4 BO 

Asp Leu Asn He Pro Gly Gly Gly Val Gly He Phe Asp Gly Cys Thr 
125 130 135 

CCC CAG TTC GGC GGT CTG CCC GGC CAG CGC TAC GGC GGC ATC TCG TCC 528 
Pro Gin Phe Gly Gly Leu Pro Gly Gin Arg Tyr Gly Gly He Ser Ser 
140 145 150 

55 CGC AAC GAG TGC GAT CGG TTC CCC GAC GCC CTC AAG CCC GGC TGC TAC 576 
Arg Asn Glu Cys Asp Arg Phe Pro Asp Ala Leu Lys Pro Gly Cys Tyr 
155 160 165 

TGG CGC TTC GAC TGG TTC AAG AAC GCC GAC AAT CCG AGC TTC AGC TTC 624 
60 Trp Arg Phe Asp Trp Phe Lys Asn Ala Asp Asn Pro Ser Phe Ser Phe 
170 175 180 

CGT CAG GTC CAG TGC CCA GCC GAG CTC GTC GCT CGC ACC GGA TGC CGC 672 
Arg Gin Val Gin Cys Pro Ala Glu Leu Val Ala Arq Thr Glv Cvs Aro 
65 185 190 195 200 

CGC AAC GAC GAC GGC AAC TTC CCT GCC GTC CAG ATC CCC TCC AGC AGC 720 
Arg Asn Asp Asp Gly Asn Phe Pro Ala Val Gin He Pro Ser Ser Ser 
205 210 215 
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ACC AGC TCT CCG GTC AAC CAG CCT ACC AGC ACC AGC ACC ACG TCC ACC 768 
Thr Ser Ser Pro Val Asn Gin Pro Thr Ser Thr Ser Thr Thr Ser Thr 
220 225 230 

D 

TCC ACC ACC TCG AGC CCG CCA GTC CAG CCT ACG ACT CCC AGC GGC TGC 816 
Ser Thr Thr Ser Ser Pro Pro Val Gin Pro Thr Thr Pro Ser Gly Cvo 
235 240 245 

10 ACT GCT GAG AGG TGG GCT CAG TGC GGC GGC AAT GGC TGG AGC GGC TGC 664 
Thr Ala Glu Arg Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cvb 
250 255 260 

ACC ACC TGC GTC GCT GGC AGC ACT TGC ACG AAG ATT AAT GAC TGG TAC 912 
15 Thr Thr Cye Val Ala Gly Ser Thr Cys Thr Lye lie Asn Asp Trp Tyr 
265 270 275 280 

CAT CAG TGC CTG T AG ACG C AGG GCAGCTTGAG GGCCTTACTG CTGGCCGCAA 964 
Sis Gin eye Leu 
20 285 



25 



CGAAATGACA CTCCCAATCA CTGTATTAGT TCTTGTACAT AATTTCGTCA TCCCTCCAGG 1024 
GATTGTCACA TAAATGCAAT GAGGAACAAT GAGTAC 1060 



(2) INFORMATION FOR SEQ ID NO: 56: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 305 amino acids 
30 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: 

35 Met Arg Ser Ser Pro Leu Leu Pro Ser Ala Val Val Ala Ala Leu Pro 
-21 -20 -15 -10 

Val Leu Ala Leu Ala Ala Asp Gly Arg Ser Thr Arg Tyr Trp Asp Cys 

An ~ 5 1 5 10 

40 

Cys Lys Pro Ser Cys Gly Trp Ala Lys Lys Ala Pro Val Asn Gin Pro 
15 20 25 

Val Phe Ser Cys Asn Ala Asn Phe Gin Arg lie Thr Asp Phe Asp Ala 
45 30 35 40 

Lys Ser Gly Cys Glu Pro Gly Gly Val Ala Tyr Ser Cys Ala Asp Gin 
45 50 55 

50 Thr Pro Trp Ala Val Asn Asp Asp Phe Ala Leu Gly Phe Ala Ala Thr 
60 65 70 75 

Ser lie Ala Gly Ser Asn Glu Ala Gly Trp Cys Cys Ala Cys Tyr Glu 
80 85 90 

55 

Leu Thr Phe Thr Ser Gly Pro Val Ala Gly Lys Lys Met Val Val Gin 
95 100 105 

Ser Thr Ser Thr Gly Gly Asp Leu Gly Ser Asn His Phe Asp Leu Asn 
60 110 115 120 

lie Pro Gly Gly Gly Val Gly lie Phe Asp Gly Cys Thr Pro Gin Phe 
125 130 135 

65 Gly Gly Leu Pro Gly Gin Arg Tyr Gly Gly He Ser Ser Arg Asn Glu 
140 145 150 155 

Cys Asp Arg Phe Pro Asp Ala Leu Lys Pro Gly Cys Tyr Trp Arg Phe 
160 165 170 
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Asp Trp Phe Lya Asn Ala Asp Asn Pro Ser Phe Ser Phe Arg Gin Val 
175 180 185 

5 Gin CyB Pro Ala Glu Leu Val Ala Arg Thr Gly Cys Arg Arg Asn Aep 
190 195 200 

Asp Gly Asn Phe Pro Ala Val Gin lie Pro Ser Ser Ser Thr Ser Ser 
205 210 215 

10 

Pro Val Asn Gin Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser Thr Thr 
220 225 230 235 

Ser Ser Pro Pro Val Gin Pro Thr Thr Pro Ser Gly Cys Thr Ala Glu 
15 240 245 2SO 

Arg Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr Thr Cys 
255 260 265 

20 Val Ala Gly Ser Thr Cys Thr Lys lie Asn Asp Trp Tyr His Gin Cvs 
270 27S 280 

Leu 

25 

<2) INFORMATION FOR SEQ ID NO: 57: 
(1) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

3 0 (C) ST RAND ED NESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Conserved region" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 

35 

Thr Arg Tyr Trp Asp Cys Cys Lys Pro /Thr 
1 5 



40 (2) INFORMATION FOR SEQ ID NO: 58: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
45 (D) TOPOLOGY : 1 ine ar 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Conserved region" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 

50 Trp Arg Phe/Tyr Aep Trp Phe 
1 5 

(2) INFORMATION FOR SEQ ID NO: 59: 
(i) SEQUENCE CHARACTERISTICS: 
55 (A) LENGTH : 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; other nucleic acid 
60 (A) DESCRIPTION: /desc = "Primer s" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 

GCTGATGGCA GGTCCACIA/CG ITAC/TTGGGAC/T TGC/TTGC/TAAA/GA/C C 41 

65 

(2) INFORMATION FOR SEQ ID NO: 60: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer as" 
5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 

GTCGGCGTTC TTA/GAACCAA/GT CA/GA/TAICG/TCC 29 

(2) INFORMATION FOR SEQ ID NO: 61: 
10 <1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /deec = "forward primer 1 M 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

TGGttC/TAAGA ACGCCGACAA TCCG 24 

20 

(2) INFORMATION FOR SEQ ID NO: 62: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /deec « "reverse primer 1" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 



30 



GCTCTAGAGC CTGCGTCTAC AGGCACTGAT 30 



(2) INFORMATION FOR SEQ ID NO: 63: 
35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 93 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "forward primer 2" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

CGGGATCCCA TTTATGATGG TCGCGTGGTG GTCTCTATTT CTGTACGGCC 
45 TTCAGGTCGC GGCACCTGCT TTCG CTGCTG ATGGCAGGTC CAC 93 



(2) INFORMATION FOR SEQ ID NO: 64: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 
50 <B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "reverse primer 2 M 
55 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 

GCTCTAGAGC CTGCGTCTAC AGGCACTGAT 30 



60 (2) INFORMATION FOR SEQ ID NO: 65: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 922 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
65 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: hybrid DNA 
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(ix) FEATURE: 

(A) NAME /KEY : CDS 

( B ) LOCATION : 1 . . 922 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

5 

CCA TTT ATG ATG CTC GCG TGG TGG TCT CTA TTT CTG TAC GGC CTT CAG 48 
Pro Phe Met Met Val Ala Trp Trp Ser Leu Phe Leu Tyr Gly Leu Gin 
1 5 10 15 

10 GTC GCG GCA CCT GCT TTC GCT CCT GAT GGC AGG TCC ACG CGG TAC TGG 9 6 

Val Ala Ala Pro Ala Phe Ala Ala Asp Gly Arg Ser Thr Arg Tyr Trp 
20 25 30 

GAT TGC TGT AAG CCG TCG TGC TCG TGG CCC GGC AAG GCG CTC CTG AAC 144 
15 Asp Cya Cys Lya Pro Ser Cya Ser Trp Pro Gly Lya Ala Leu Val Aan 
35 40 45 

CAG CCC GTC TAC GCC CGC AAC GCA AAC TTC CAG CGC ATC ACC GAC CCC 192 

_ _ G-.R Pre Vs.*. Tyr Ala Arg Asr* Ala Asn Phe Gin Asa lie Thr Asp Pro 

2 0 50 55 60 

AAC GCC AAG TCC GGC TCC GAT GGC GGC TCC GCC TTC TCC TGC GCC GAC 240 
Asn Ala Lys Ser Gly Cys Asp Gly Gly Ser Ala Phe Ser Cys Ala Asp 
25 65 70 75 80 

CAG ACC CCG TGG GCC GTG AGC GAC GAC TTT GCC TAC GGT TTC GCG GCT 288 
Gin Thr Pro Trp Ala Val Ser Asp Asp Phe Ala Tyr Gly Phe Ala Ala 
85 90 95 

3 0 ACG GCG CTC GCC GGC CAG TCC GAG TCT TCG TGG TGC TGT GCC TGC TAC 336 

Thr Ala Leu Ala Gly Gin Ser Glu Ser Ser Trp Cya Cys Ala Cys Tyr 
100 105 no 

GAA CTC ACC TTC ACT TCG GGC CCC GTT GCT GGC AAG AAG ATG GCT GTC 384 
35 Glu Leu Thr Phe Thr Ser Gly Pro Val Ala Gly Lys Lys Met Ala Val 
115 120 125 

CAG TCC ACC AGC ACT GGC GGT GAC CTC GGT AGC AAC CAC TTT GAC CTC 432 
Gin Ser Thr Ser Thr Gly Gly Asp Leu Gly Ser Asn His Phe Asp Leu 
40 130 135 140 



45 



AAC ATG CCA GGT GGC GGT GTC CGC ATC TTC GAC GGC TGC TCG CCT CAG 480 

Aan Met Pro Gly Gly Gly Val Gly He Phe Asp Gly Cys Ser Pro Gin 
145 150 155 160 

GTT GGC GGT CTC GCC GGC CAG CGC TAT GGC GGC GTC TCG TCC CGC AGC 528 

Val Gly Gly Leu Ala Gly Gin Arg Tyr Gly Gly Val Ser Ser Arg ser 

165 170 175 

50 GAA TGC GAC TCC TTC CCC GCG GCA CTC AAG CCC GGC TGC TAC TGG CGC 576 

Glu Cys Asp Ser Phe Pro Ala Ala Leu Lys Pro Gly Cys Tyr Trp Arg 
180 185 190 

TAC GAC TGC TTT AAG AAC GCC GAC AAT CCG AGC TTC AGC TTC CGT CAG 624 

55 Tyr Asp Trp Phe Lys Asn Ala Asp Asn Pro Ser Phe Ser Phe Arg Gin 
19S 200 205 

GTC CAG TGC CCA GCC GAG CTC GTC GCT CGC ACC GGA TGC CGC CGC AAC 672 

Val Gin Cys Pro Ala Glu Leu Val Ala Arg Thr Gly Cys Arg Arg Asn 
60 210 215 220 



65 



GAC GAC GGC AAC TTC CCT GCC GTC CAG ATC CCC TCC AGC AGC ACC AGC 720 

Asp Asp Gly Asn Phe Pro Ala Val Gin He Pro Ser Ser Ser Thr Ser 
225 230 235 240 

TCT CCG GTC AAC CAG CCT ACC AGC ACC AGC ACC ACG TCC ACC TCC ACC 768 

Ser Pro Val Asn Gin Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser Thr 

245 250 255 
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ACC TCG AGC CCG CCA GTC CAG CCT ACG ACT CCC AGC GGC TGC ACT GCT 816 
Thr Ser Ser Pro Pro Val Gin Pro Thr Thr Pro Ser Gly Cys Thr Ala 
260 265 270 

GAG AGG TGG GCT CAG TGC GGC GGC AAT GGC TGG AGC GGG TGC ACC ACC 864 
Glu Arg Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr Thr 
275 280 285 

10 TGC GTC GCT GGC AGC ACT TGC ACG AAG ATT AAT GAC TGG TAC CAT CAG 912 
Cys Val Ala Gly Ser Thr Cys Thr Lys lie Asn Asp Trp Tyr His Gin 
290 295 300 

TGC CTG TAG A 9 22 

15 Cys Leu * 
305 



(2) INFORMATION FOR SEQ ID NO: 66: 
20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 307 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 
25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Pro Phe Met Met Val Ala Trp Trp Ser Leu Phe Leu Tyr Gly Leu Gin 
15 10 15 

30 Val Ala Ala Pro Ala Phe Ala Ala Asp Gly Arg Ser Thr Arg Tyr Trp 
20 25 30 

Asp Cys Cys Lys Pro Ser Cys Ser Trp Pro Gly Lys Ala Leu Val Asn 
35 40 45 

35 

Gin Pro Val Tyr Ala Arg Asn Ala Asn Phe Gin Arg lie Thr Asp Pro 
50 55 60 

Asn Ala Lys Ser Gly Cys Asp Gly Gly Ser Ala Phe Ser Cys Ala Asp 
40 65 70 75 80 

Gin Thr Pro Trp Ala Val Ser Asp Asp Phe Ala Tyr Gly Phe Ala Ala 
85 90 95 

45 Thr Ala Leu Ala Gly Gin Ser Glu Ser Ser Trp Cys Cys Ala Cys Tyr 
100 105 110 

Glu Leu Thr Phe Thr Ser Gly Pro Val Ala Gly Lys Lys Met Ala Val 
115 120 12S 

50 

Gin Ser Thr Ser Thr Gly Gly Aap Leu Gly Ser Asn His Phe Asp Leu 
130 135 140 

Asn Met Pro Gly Gly Gly Val Gly lie Phe Asp Gly Cys Ser Pro Gin 
55 145 150 155 160 

Val Gly Gly Leu Ala Gly Gin Arg Tyr Gly Gly Val Ser Ser Arg Ser 
165 170 175 

60 Glu Cys Asp Ser Phe Pro Ala Ala Leu Lys Pro Gly Cys Tyr Trp Arg 
180 IBS 190 

Tyr Asp Trp Phe Lys Asn Ala Asp Asn Pro Ser Phe Ser Phe Arg Gin 
195 200 205 

65 

Val Gin Cys Pro Ala Glu Leu Val Ala Arg Thr Gly Cys Arg Arg Asn 
210 215 220 

Asp Asp Gly Asn Phe Pro Ala Val Gin He Pro* Ser Ser Ser Thr Ser 
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225 230 235 240 

Ser Pro Val Asn Gin Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser Thr 
5 245 250 255 

Thr Ser Ser Pro Pro Val Gin Pro Thr Thr Pro Ser Gly Cys Thr Ala 
260 265 270 

Glu Arg Trp Ala Gin Cys Gly Gly Aan Gly Trp Ser Gly Cys Thr Thr 
10 275 280 285 

Cys Val Ala Gly Ser Thr Cys Thr Lys He Asn Asp Trp Tyr His Gin 
290 295 300 

15 Cys Leu * 
305 

(2) INFORMATION FOR SEQ ID NO: 68: 
{ i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 922 base pairB 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
25 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 2. .922 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

30 C CCA TTT ATG ATG GTC GCG TGG TGG TCT CTA TTT CTG TAC GGC CTT 46 
Pro Phe Met Met Val Ala Trp Trp Ser Leu Phe Leu Tyr Gly Leu 
15 10 15 

CAG GTC GCG GCA CCT GCT TTC GCT GCT GAT GGC AGG TCC ACG AGG TAC 94 
35 Gin Val Ala Ala Pro Ala Phe Ala Ala Asp Gly Arg Ser Thr Arg Tyr 

20 25 30 

TGG GAT TGT TGT AAG CCC TCT TGC TCC TGG GGC GAC AAG GCC TCG GTC 142 
Trp Asp Cys Cys Lys Pro Ser Cys Ser Trp Gly Asp Lys Ala Ser Val 
40 35 40 45 



45 



190 



238 



286 



AGC GCC CCC GTC CTG ACC TGC GAC AAG AAC GAC AAC CCC ATC TCC GAC 
Ser Ala Pro Val Leu Thr Cys Asp Lys Asn Asp Asn Pro He Ser Asp 
50 55 60 F 

GCC AAC GCC GTG AGC GGT TGC AAC GGC GGC ACT TCC TAC ACC TGC AGC 

Ala Asn Ala Val Ser Gly Cys Asn Gly Gly Thr Ser Tyr Thr Cys Ser 
65 70 75 

• * 

50 AAC AAC TCC CCG TGG GCT GTC AAC GAC AAC CTC GCC TAT GGC TTT GCC 

Asn Asn Ser Pro Trp Ala Val Asn Asp Asn Leu Ala Tyr Gly Phe Ala 
80 85 90 9 5 

GCT ACC AAG CTC TCT GGA GGC TCC GAG TCC AGC TGG TGC TGT GCT TGC 334 
55 Ala Thr Lys Leu Ser Gly Gly Ser Glu Ser Ser Trp Cys Cys Ala Cys 
100 105 no 

TAC GCT CTC ACC TTT ACG ACT GGC CCC GTG AAG GGC AAG ACC ATG GTC 382 
Tyr Ala Leu Thr Phe Thr Thr Gly Pro Val Lys Gly Lys Thr Met Val 
60 H5 120 125 

GTA CAG TCC ACC AAC ACC GGA GGC GAT CTC GGC GAG AAC CAC TTC GAT 430 
Val Gin Ser Thr Asn Thr Gly Gly Asp Leu Gly Glu Asn His Phe Asp 
65 130 135 140 

CTC CAG ATG CCC GGC GGC GGT GTC GGC ATC TTT GAC GGC TGC AGC TCC 478 
Leu Gin Met Pro Gly Gly Gly Val Gly He Phe Asp Gly Cys Ser Ser 
145 150 155 
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10 



CAG TGG GGT GGC CTC GGC GGT GCT CAG TAC GGC GGC ATC TCG TCG CGA 526 
Gin Trp Gly Gly Leu Gly Gly Ala Gin Tyr Gly Gly lie Ser Ser Arg 
160 165 170 175 

AGC GAC TGC GAC AGC TTC CCC GAG CTG CTC AAG GAC GGC TGC TAC TGG 574 
Ser Asp Cys Asp Ser Phe Pro Glu Leu Leu Lya Asp Gly Cys Tyr Trp 
180 185 190 

CGC TAC GAC TGG TTC AAG AAC GCC GAC AAT CCG AGC TTC AGC TTC CGT 622 
Arg Tyr Asp Trp Phe Lya Asn Ala Asp Asn Pro Ser Phe Ser Phe Arg 
195 200 205 



15 CAG GTC CAG TGC CCA GCC GAG CTC GTC GCT CGC ACC GGA TGC CGC CGC 670 
Gin Val Gin Cys Pro Ala Glu Leu Val Ala Arg Thr Gly Cys Arg Arg 
210 215 220 

AAC GAC GAC GGC AAC TTC CCT GCC GTC CAG ATC CCC TCC AGC AGC ACC 718 
20 Asn Asp Asp Gly Asn Phe Pro Ala Val Gin lie Pro Ser Ser Ser Thr 
225 230 235 

AGC TCT CCG GTC AAC CAG CCT ACC AGC ACC AGC ACC ACG TCC ACC TCC 766 
Ser Ser Pro Val Asn Gin Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser 
25 240 245 250 255 

ACC ACC TCG AGC CCG CCA GTC CAG CCT ACG ACT CCC AGC GGC TGC ACT 814 

Thr Thr Ser Ser Pro Pro Val Gin Pro Thr Thr Pro Ser Gly Cys Thr 
260 265 270 

30 

GCT GAG AGG TGG GCT CAG TGC GGC GGC AAT GGC TGG AGC GGC TGC ACC 862 

Ala Glu Arg Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr 
275 280 285 

35 ACC TGC GTC GCT GGC AGC ACT TGC ACG AAG ATT AAT GAC TGG TAC CAT 910 
Thr Cys Val Ala Gly Ser Thr Cys Thr Lys lie Asn Asp Trp Tyr His 
290 295 300 

CAG TGC CTG TAG 922 
40 Gin Cys Leu * 
305 

(2) INFORMATION FOR SEQ ID NO: 68s 

(1) SEQUENCE CHARACTERISTICS: 
45 (A) LENGTH: 307 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 



50 



Pro Phe Met Met Val Ala Trp Trp Ser Leu Phe Leu Tyr Gly Leu Gin 
15 10 15 



Val Ala Ala Pro Ala Phe Ala Ala Asp Gly Arg Ser Thr Aro Tyr Trp 
55 20 25 30 

Asp Cys Cys Lys Pro Ser Cys Ser Trp Gly Asp Lys Ala Ser Val Ser 
35 40 45 

60 Ala Pro Val Leu Thr Cys Asp Lys Asn Asp Asn Pro lie Ser Asp Ala 
50 55 60 

Asn Ala Val Ser Gly Cys Asn Gly Gly Thr Ser Tyr Thr Cys Ser Asn 
65 70 75 80 

65 

Asn Ser Pro Trp Ala Val Asn Asp Asn Leu Ala Tyr Gly Phe Ala Ala 
85 90 95 

Thr Lys Leu Ser Gly Gly Ser Glu Ser Ser Trp Cys Cys Ala Cys Tyr 
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100 105 no 

Ala Leu Thr Phe Thr Thr Gly Pro Val Lys Gly Lys Thr Met Val Val 
115 120 125 

5 

Gin Ser Thr Asn Thr Gly Gly Asp Leu Gly Glu Asn His Phe Asp Leu 
130 135 140 

Gin Met Pro Gly Gly Gly Val Gly lie Phe Asp Gly Cys Ser Ser Gin 
10 145 150 155 160 

Trp Gly Gly Leu Gly Gly Ala Gin Tyr Gly Gly lie Ser Ser Arg Ser 
165 170 175 

15 Asp Cys Asp Ser Phe Pro Glu Leu Leu Lys Asp Gly Cys Tyr Trp Arg 
180 185 190 

Tyr Asp Trp Phe Lys Asn Ala Asp Asn Pro Ser Phe Ser Phe Arg Gin 

195 200 205 

20 

Val Gin Cys Pro Ala Glu Leu Val Ala Arg Thr Gly Cys Arg Arg Asn 
210 215 220 

Asp Asp Gly Asn Phe Pro Ala Val Gin lie Pro Ser Ser Ser Thr Ser 
25 225 230 235 240 

Ser Pro Val Asn Gin Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser Thr 
245 250 255 

30 Thr Ser Ser Pro Pro Val Gin Pro Thr Thr Pro Ser Gly Cys Thr Ala 
260 265 270 

Glu Arg Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr Thr 
275 280 285 

35 

Cys Val Ala Gly Ser Thr Cys Thr Ly b lie Asn Asp Trp Tyr His Gin 
290 295 300 

Cys Leu * 
40 305 

<2> INFORMATION FOR SEQ ID NO: 69: 
(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 928 base pairs 
45 <B) TYPE: nucleic acid 

<C> STRANDEDNESS: single 
<D) TOPOLOGY r linear 
(ii) MOLECULE TYPE: cDNA 
( ix ) FEATURE : 
50 (A) NAME/KEY: COS 

(B) LOCATION: 1. .928 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

CCA TTT ATG ATG GTC GCG TGG TGG TCT CTA TTT CTG TAC GGC CTT CAG 48 
55 Pro Phe Met Met Val Ala Trp Trp Ser Leu Phe Leu Tyr Gly Leu Gin 
1 S 10 15 

GTC GCG GCA CCT GCT TTC GCT GCT GAT GGC AGG TCC ACG AGG TAC TGG 96 
Val Ala Ala Pro Ala Phe Ala Ala Asp Gly Arg Ser Thr Arq Tvr Tro 
60 20 25 30 

GAT TGC TGC AAG CCC TCT TGC TCT TGG GGC GGA AAG GCT GCT GTC AGC 144 
Asp Cys Cys Lys Pro Ser Cys Ser Trp Gly Gly Lys Ala Ala Val Ser 
35 40 45 



65 



GCC CCT GCT TTG ACC TGT GAC AAG AAG GAC AAC CCC ATC TCA AAC CTG 192 
Ala Pro Ala Leu Thr Cys Asp Lys Lys Asp Asn Pro lie Ser Asn Leu 
50 55 60 
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AAC GCT CTC AAC GGT TGT GAG GGT GGT GGT TCT GCC TTC GCC TGC ACC 240 
Aen Ala Val Asn Gly Cys Glu Gly Gly Gly Ser Ala Phe Ala Cys Thr 
65 70 75 80 

5 AAC TAC TCT CCT TGG GCG GTC AAT GAC AAC CTT GCC TAC GGC TTC GCT 268 
Asn Tyr Ser Pro Trp Ala Val Aen Asp Asn Leu Ala Tyr Gly Phe Ala 
85 90 95 

GCA ACC AAG CTT GCC GGT GGC TCC GAG GGT AGC TGG TGC TGT GCT TGC 336 
10 Ala Thr Lys Leu Ala Gly Gly Ser Glu Gly Ser Trp Cys Cys Ala Cys 
100 105 HO 

TAC GCA CTT ACC TTC ACC ACC GGT CCC GTC AAG GGT AAG ACC ATG GTC 384 
Tyr Ala Leu Thr Phe Thr Thr Gly Pro Val Lys Gly Lys Thr Met Val 
15 115 120 125 



20 



GTC CAG TCC ACC AAC ACT GGA GGC GAC CTC GGT GAC AAC CAC TTC GAT 432 
Val Gin Ser Thr Asn Thr Gly Gly Asp Leu Gly Asp Asn His Phe Asp 
130 135 140 

CTT ATG ATG CCT GGT GGC GGT GTT GGA ATC TTC GAC GGT TGC ACT TCT 480 
Leu Met Met Pro Gly Gly Gly Val Gly He Phe Asp Gly Cys Thr Ser 

145 150 155 160 

*- 

25 CAG TTC GGC AAG GCT CTC GGT GGT GCC CAG TAC GGT GGC ATC TCC TCC 528 
Gin Phe Gly Lys Ala Leu Gly Gly Ala Gin Tyr Gly Gly He Ser Ser 
165 170 175 

CGA AGC GAG TGC GAC AGC TTC CCT GAG ACT CTC AAG GAC GGT TGC CAT 576 
30 Arg Ser Glu Cys Asp Ser Phe Pro Glu Thr Leu Lys Asp Gly Cys His 
180 185 190 

TGG CGC TTC GAC TGG TTC AAG AAC GCC GAC AAT CCG AGC TTC AGC TTC 624 
Trp Arg Phe Asp Trp Phe Lys Asn Ala Asp Asn Pro Ser Phe Ser Phe 
35 195 200 205 



40 



CGT CAG GTC CAG TGC CCA GCC GAG CTC GTC GCT CGC ACC GGA TGC CGC 672 
Arg Gin Val Gin Cys Pro Ala Glu Leu Val Ala Arg Thr Gly Cys Arg 
210 215 220 

CGC AAC GAC GAC GGC AAC TTC CCT GCC GTC CAG ATC CCC TCC AGC AGC 720 
Arg Asn Asp Asp Gly Asn Phe Pro Ala Val Gin He Pro Ser Ser Ser 
225 230 235 240 

45 ACC AGC TCT CCG GTC AAC CAG CCT ACC AGC ACC AGC ACC ACG TCC ACC 768 
Thr Ser Ser Pro Val Aen Gin Pro Thr Ser Thr Ser Thr Thr Ser Thr 
245 250 255 

TCC ACC ACC TCG AGC CCG CCA GTC CAG CCT ACG ACT CCC AGC GGC TGC 816 
50 Ser Thr Thr Ser Ser Pro Pro Val Gin Pro Thr Thr Pro Ser Gly Cys 
260 265 270 

ACT GCT GAG AGG TGG GCT CAG TGC GGC GGC AAT GGC TGG AGC GGC TGC 864 
Thr Ala Glu Arg Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys 
55 275 280 285 

ACC ACC TGC GTC GCT GGC AGC ACT TGC ACG AAG ATT AAT GAC TGG TAC 912 
Thr Thr Cys Val Ala Gly Ser Thr Cys Thr Lys He Asn Asp Trp Tyr 
290 295 300 



60 



65 



CAT CAG TGC CTG TAG A 928 

His Gin Cys Leu * 

305 

(2) INFORMATION FOR SEQ ID NO: 70: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 309 amino acids 

(B) TYPE: amino acid 
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(D> TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
Pro Phe Met Met Vai Ala Trp Trp Ser Leu Phe Leu Tyr Cly Leu Gin 
5 1 5 10 15 

Val Ala Ala Pro Ala Phe Ala Ala Asp Gly Arg Ser Thr Arq Tyr Trp 
20 25 30 

10 Asp Cye Cys Lys Pro Ser Cya Ser Trp Gly Gly Lys Ala Ala Val Ser 
35 40 45 

Ala Pro Ala Leu Thr Cys Asp Lye Lys Asp Asn Pro He Ser Asn Leu 
50 55 60 

15 

Asn Ala Val Asn Gly Cys Glu Gly Gly Gly Ser Ala Phe Ala Cys Thr 
€5 70 75 80 

Asm Tyr Ser Pro Trp Ala Val Asn Asp Asn Leu Ala Tyr Giy Phe Ala 
20 85 90 95 

Ala Thr Lys Leu Ala Gly Gly Ser Glu Gly Ser Trp Cys Cys Ala Cys 
lOO 105 110 

25 Tyr Ala Leu Thr Phe Thr Thr Gly Pro Val Lys Gly Lys Thr Met Val 
115 120 125 

Val Gin Ser Thr Asn Thr Gly Gly Asp Leu Gly Asp Asn His Phe Asp 
130 135 140 

30 

Leu Met Met Pro Gly Gly Gly Val Gly He Phe Asp Gly Cys Thr Ser 
145 150 155 160 

Gin Phe Gly Lys Ala Leu Gly Gly Ala Gin Tyr Gly Gly He Ser Ser 
35 165 170 175 

Arg Ser Glu Cys Asp Ser Phe Pro Glu Thr Leu Lys Asp Gly Cys His 
180 185 190 

40 Trp Arg Phe Asp Trp Phe Lys Asn Ala Asp Asn Pro Ser Phe Ser Phe 
195 200 205 

Arg Gin Val Gin Cys Pro Ala Glu Leu Val Ala Arg Thr Gly Cys Aro 
45 210 215 220 

Arg Asn Asp Asp Gly Asn Phe Pro Ala Val Gin He Pro Ser Ser Ser 
225 230 235 240 

Thr Ser Ser Pro Val Asn Gin Pro Thr Ser Thr Ser Thr Thr Ser Thr 
50 245 250 255 

Ser Thr Thr Ser Ser Pro Pro Val Gin Pro Thr Thr Pro Ser Gly Cys 
260 265 270 

55 Thr Ala Glu Arg Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys 
275 280 285 

Thr Thr Cys Val Ala Gly Ser Thr Cya Thr Lys He Asn Asp Trp Tyr 
290 295 300 

60 

His Gin Cys Leu * 



(2) INFORMATION FOR SEQ ID NO: 71: 
65 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 915 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 
(ix) FEATURE; 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. ,915 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 



10 



25 



30 



45 



50 



60 



65 



ATG 
Met 

1 


ATG 
Met 


GTC 
Val 


GCG 
Ala 


TGG 
Trp 
5 


TGG 
Trp 


TCT 
Ser 


CTA 
Leu 


TTT 
Phe 


CTG 
Leu 
10 


TAC GGC 
Tyr Gly 


CTT 

Leu 


CAG 
Gin 


GTC 
Val 
15 


GCG 
Ala 


48 


GCA 
Ala 


CCT 
Pro 


GCT 
Ala 


TTC 
Phe 
20 


GCT 
Ala 


GCT 
Ala 


GAT 
Asp 


GGC 
Gly 


AGG 
Arg 
25 


TCC 

ser 


ACG 
Thr 


AGG 
Arg 


TAT 
Tyr 


TGG GAT 
Trp Asp 
30 


TGT 
Cys 


96 


TGC 
Cys 


AAG 
Lya 


CCG 
Pro 
35 


TCA 
Ser 


TGT 
Cys 


GCT 
Ala 


TGG 
Trp 


TCC 
Ser 
40 


GGC 
Gly 


AAG 
Lys 


GCC 
Ala 


TCA 
Ser 


GTG 
Val 
45 


TCA 
Ser 


TCT 
Ser 


CCC 
Pro 


144 


GTG 

Val 


CGA 
Arg 
50 


ACC 

Thr 


TGT 

Cys 


GAC 
Asp 


GCA 
Ala 


AAC 
Asn 
55 


AAC 
Asn 


TCG 
Ser 


CCG 
Pro 


CTG 
Leu 


TCC 
Ser 
60 


GAC GTC GAC 
Asp Val Asp 


GCA 
Ala 


192 


AAG 
Lya 
65 


AGT 
Ser 


GCG 
Ala 


TGC 
Cys 


GAT 
Asp 


GGA 
Gly 
70 


GGC 
Gly 


GTT 
Val 


GCT 
Ala 


TAC 
Tyr 


ACT 
Thr 
75 


TGT 
Cys 


TCA 

Ser 


AAC 
Asn 


AAC 
Asn 


GCG 
Ala 
80 


240 


CCT 
Pro 


TGG GCT 
Trp Ala 


GTT 
Val 


AAC 
Asn 
85 


GAT 
Asp 


AAC 
Asn 


CTC 
Leu 


TCT 
Ser 


TAT 
Tyr 
90 


GGT 
Gly 


TTC 
Phe 


GCG 
Ala 


GCC 
Ala 


ACA 
Thr 
95 


GCT 
Ala 


288 


ATC 

lie 


AAT 

Asn 


GGC 
Gly 


GGC 
Gly 
100 


AGC 
Ser 


GAG 
Glu 


TCT 
Ser 


AGC 
Ser 


TGG 
Trp 
105 


TGC 
Cys 


TGT 
Cys 


GCA 
Ala 


TGC 
Cys 


TAC 
Tyr 
110 


AAG 

Lys 


TTG 
Leu 


336 


ACT 
Thr 


TTC 
Phe 


ACG 
Thr 
115 


AGC 
Ser 


GGA 
Gly 


CCT 
Pro 


GCT 
Ala 


TCT 
Ser 
120 


GGA 
Gly 


AAG 
Lys 


GTC 
Val 


ATG 
Met 


GTC 
Val 
125 


GTT 
Val 


CAA 
Gin 


TCA 
Ser 


384 


ACC 
Thr 


AAC 

Asn 
130 


ACC 
Thr 


GGG 
Gly 


TAC 
Tyr 


GAT 
Asp 


CTC 
Leu 
135 


TCT 
Ser 


AAC 
Asn 


AAC 
Asn 


CAC 
His 


TTT 
Phe 
140 


GAC 
Asp 


ATT 
He 


CTT 
Leu 


ATG 
Met 


432 


CCA 
Pro 
145 


GGT GGC 
Gly Gly 


GGT 
Gly 


GTT 
Val 


GGA 
Gly 
150 


GCG 
Ala 


TTC 
Phe 


GAC 
Asp 


GGC 
Gly 


TGC 
Cys 
155 


TCT 
Ser 


AGG 
Arg 


CAG 
Gin 


TAC 
Tyr 


GGC 
Gly 
160 


480 


AGC 
Ser 


ATC 
He 


CCT 
Pro 


GGG 
Gly 


GAG 
Glu 
165 


CGA 
Arg 


TAT 
Tyr 


GGG 
Gly 


GGT 
Gly 


GTC 
Val 
170 


ACA 
Thr 


TCA 
Ser 


AGG 
Arg 


GAC CAA 
Asp Gin 
175 


TGC 
Cys 

2 


528 


GAC 
Asp 


CAA 
Gin 


ATG 
Met 


CCA 
Pro 

iao 


AGT 
Ser 


GCA 
Ala 


CTC 
Leu 


AAG 
Lys 


CAG 
Gin 
185 


GGC 
Gly 


TGC 
Cys 


TAT 
Tyr 


TGG CGC 
Trp Arg 
190 


TTC 
Phe 


GAT 
Asp 


576 


TGG 
Trp 


TTC 
Phe 


AAG 
Lys 
195 


AAC 
Asn 


GCC 
Ala 


GAC 
Asp 


AAT 
Asn 


CCG 
Pro 
200 


AGC 
Ser 


TTC 
Phe 


AGC 
ser 


TTC 
Phe 


CGT CAG 
Arg Gin 
205 


GTC 
Val 


CAG 
Gin 


624 


TGC 
Cys 


CCA 
Pro 
210 


GCC 
Ala 


GAG 
Glu 


CTC 
Leu 


GTC 
Val 


GCT 
Ala 
215 


CGC 
Arg 


ACC 
Thr 


GGA 

Gly 


TGC 
Cys 


CGC 
Arg 
220 


CGC 
Arg 


AAC 
Asn 


GAC 
Asp 


GAC 
Asp 


672 


GGC 
Gly 
225 


AAC 
Asn 


TTC 
Phe 


CCT 
Pro 


GCC 
Ala 


GTC 
Val 
230 


CAG 
Gin 


ATC 
He 


CCC 
Pro 


TCC 
Ser 


AGC 
Ser 
235 


AGC 
Ser 


ACC 
Thr 


AGC 
Ser 


TCT 
Ser 


CCG 

Pro 
240 


720 


GTC 
Val 


AAC 
Asn 


CAG 
Gin 


CCT 
Pro 


ACC 
Thr 
245 


AGC 
Ser 


ACC 
Thr 


AGC 
Ser 


ACC 
Thr 


ACG 
Thr 
250 


TCC 
Ser 


ACC 
Thr 


TCC 
Ser 


ACC 
Thr 


ACC 
Thr 
255 


TCG 
Ser 


768 
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AGC CCG CCA GTC CAG CCT ACG ACT CCC AGC GGC TGC ACT GCT GAG AGG 816 

Ser Pro Pro Val Gin Pro Thr Thr Pro Ser Gly Cya Thr Ala Glu Arp 
5 260 265 270 

TGG GCT CAG TGC GGC GGC AAT GGC TGG AGC GGC TGC ACC ACC TGC GTC 864 

Trp Ala Gin Cys Gly Gly Aon Gly Trp Ser Gly Cys Thr Thr Cys Val 

275 280 285 

10 GCT GGC AGC ACT TGC ACG AAG ATT AAT GAC TGG TAC CAT CAG TGC CTG 912 

AAa ?iX Ser Thr Cys Thr Lys Iie Aan As P Tr P Tyr His Gin Cys Leu 
290 295 300 



15 



™° 915 
305 



(2) XnFGRHAxIGN r OR S&Q ID NO: 72: 
20 (i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 305 amino acids 
<B) TYPE: amino acid 
(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 
25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 

Met Met Val Ala Trp Trp Ser Leu Phe Leu Tyr Gly Leu Gin Val Ala 
1 5 io is 

30 Ala Pro Ala Phe Ala Ala Asp Gly Arg Ser Thr Arg Tyr Trp Asp Cvs 
20 25 30 

Cys Lye Pro Ser Cys Ala Trp Ser Gly Lys Ala Ser Val Ser Ser Pro 

35 " 40 45 

Val Arg Thr Cys Asp Ala Asn Asn Ser Pro Leu Ser Ab P Val Asp Ala 

50 55 60 

An H= Ser Ala Cys Asp Gly Gly Val Ala T y r Thr Cys ser Asn Asn Ala 
40 65 70 75 80 

Pro Trp Ala Val Asn Asp Asn Leu Ser Tyr Gly Phe Ala Ala Thr Ala 
B5 90 95 

45 He Asn Gly Gly Ser Glu Ser Ser Trp Cys Cys Ala Cys Tyr Lys Leu 
100 105 iio 

Thr Phe Thr Ser Gly Pro Ala Ser Gly Lys Val Met Val Val Gin Ser 

lis 



50 



Thr Asn Thr Gly Tyr Asp Leu Ser Asn Asn His Phe Asp He Leu Met 
130 135 140 

f« Gly Gly Gly VaI Gly Ala Phe Aap Gly °y Q ser Arg Gin Tyr Gly 
55 145 150 155 * 16 J 

Ser He Pro Gly Glu Arg Tyr Gly Gly Val Thr Ser Arg Asp Gin Cys 
165 170 175 

60 Asp Gin Met Pro Ser Ala Leu Lys Gin Gly Cys Tyr Trp Arg Phe Asp 
l fl 0 185 190 

Trp Phe Lys Asn Ala Asp Asn Pro Ser Phe Ser Phe Arg Gin Val Gin 
65 195 200 205 

Cys Pro Ala Glu Leu Val Ala Arg Thr Gly Cya Arg Arg Asn Asp Asp 
2lu 215 220 

ly Asn Phe Pro Ala Val Gin He Pro Ser Ser Ser Thr Ser Ser Pro 
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60 

225 230 235 240 

Val Asn Gin Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser Thr Thr Ser 
245 250 255 

Ser Pro Pro Val Gin Pro Thr Thr Pro Ser Gly Cys Thr Ala Glu Arg 
260 265 270 



Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr Thr Cys Val 
10 275 280 285 



15 



Ala Gly Ser Thr Cys Thr Lys He Asn Asp Trp Tyr His Gin Cys Leu 
290 295 300 

305 



(2) INFORMATION FOR SEQ ID NO: 73: 
2 0 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 925 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
25 (ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME /KEY: COS 

(B) LOCATION: 2. .925 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 

30 

C CCA TTT ATG ATG GTC GCG TGG TGG TCT CTA TTT CTG TAC GGC CTT 46 
Pro Phe Met Met Val Ala Trp Trp Ser Leu Phe Leu Tyr Gly Leu 
15 10 15 

35 GAG GTC GCG GCA CCT GCT TTC GCT GCT GAT GGC AGG TCC ACG CGG TAT 94 
Gin Val Ala Ala Pro Ala Phe Ala Ala Asp Gly Arg Ser Thr Arg Tyr 
20 25 30 

TGG GAT TGC TGT AAG CCC AGC TGC TCC TGG CCC GAC AAG GCC CCC GTA 142 
40 Trp Asp Cys Cys Lys Pro Ser Cys Ser Trp Pro Asp Lys Ala Pro Val 
35 40 45 

GGT TCC CCC GTA GGC ACC TGC GAC GCC GGC AAC AGC CCC CTC GGC GAC 190 
Gly Ser Pro Val Gly Thr Cys Asp Ala Gly Asn Ser Pro Leu Gly Asp 
45 50 55 60 

CCC CTG GCC AAG TCT GGC TGC GAG GGC GGC CCG TCG TAC ACG TGC GCC 238 

Pro Leu Ala Lys Ser Gly Cys Glu Gly Gly Pro Ser Tyr Thr Cys Ala 

65 70 75 

50 

AAC TAC CAG CCG TGG GCG GTC AAC GAC CAG CTG GCC TAC GGC TTC GCG 286 

Asn Tyr Gin Pro Trp Ala Val Asn Asp Gin Leu Ala Tyr Gly Phe Ala 

80 85 90 95 

55 GCC ACG GCC ATC AAC GGC GGC ACC GAG GAC TCG TGG TGC TGC GCC TGC 334 
Ala Thr Ala lie Asn Gly Gly Thr Glu Asp Ser Trp Cys Cys Ala Cys 
100 105 110 

TAC AAG CTC ACC TTC ACC GAC GGC CCG GCC TCG GGC AAG ACC ATG ATC 382 
60 Tyr Lys Leu Thr Phe Thr Asp Gly Pro Ala Ser Gly Lys Thr Met lie 
115 120 125 

GTC CAG TCC ACC AAC ACG GGC GGC GAC CTG TCC GAC AAC CAC TTC GAC 430 
Val Gin Ser Thr Asn Thr Gly Gly Asp Leu Ser Aep Asn His Phe Asp 
65 130 135 140 

CTG CTC ATC CCC GGC GGC GGC GTC GGC ATC TTC GAC GGC TGC ACC TCC 478 
Leu Leu lie Pro Gly Gly Gly Val Gly lie Phe Asp Gly Cys Thr Ser 
145 150 155 
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CAG TAC GGC CAG GCC CTG CCC GGC GCC CAG TAC GGC GGC GTC AGC TCC 526 

Gin Tyr Gly Gin Ala Leu Pro Gly Ala Gin Tyr Gly Gly Val Ser Ser 
160 165 170 175 

5 

CGC GCC GAG TGC GAC CAG ATG CCC GAG GCC ATC AAG GCC GGC TGC CAG 574 

Arg Ala Glu Cys Asp Gin Met Pro Glu Ala lie LyB Ala Gly Cys Gin 
180 185 190 

10 TGG CGC TAC GAT TGG TTT AAG AAC GCC GAC AAT CCG AGC TTC AGC TTC 622 
Trp Arg Tyr Asp Trp Phe Lye Aan Ala Asp Asn Pro Ser Phe Ser Phe 
195 200 205 

CGT CAG GTC CAG TGC CCA GCC GAG CTC GTC GCT CGC ACC GGA TGC CGC 670 
15 Arg Gin Val Gin Cya Pro Ala Glu Leu Val Ala Arg Thr Gly Cys Arg 
210 215 220 

CGC AAC GAC GAC GGC AAC TTC CCT GCC GTC CAG ATC CCC TCC AGC AGC 71B 
Arg Asn Asp Asp Gly Asn Phe Pro Ala Val Gin lie Pro Ser Ser ser 
20 225 230 235 

ACC AGC TCT CCG GTC AAC CAG CCT ACC AGC ACC AGC ACC ACG TCC ACC 766 
Thr Ser Ser Pro Val Asn Gin Pro Thr Ser Thr Ser Thr Thr Ser Thr 

_ 240 245 250 255 

25 

TCC ACC ACC TCG AGC CCG CCA GTC CAG CCT ACG ACT CCC AGC GGC TGC 814 
Ser Thr Thr Ser Ser Pro Pro Val Gin Pro Thr Thr Pro Ser Gly Cys 
260 265 270 

30 ACT GCT GAG AGG TGG GCT CAG TGC GGC GGC AAT GGC TGG AGC GGC TGC 862 
Thr Ala Glu Arg Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys 
275 280 285 

ACC ACC TGC GTC GCT GGC AGC ACT TGC ACG AAG ATT AAT GAC TGG TAC 910 
35 Thr Thr Cys Val Ala Gly Ser Thr Cys Thr Lys lie Asn Asp Trp Tyr 
290 295 300 

CAT CAG TGC CTG TAG 925 
His Gin Cys Leu * 
40 305 

(2) INFORMATION FOR SBQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 
45 (A) LENGTH: 308 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
(ii> MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 

50 

Pro Phe Met Met Val Ala Trp Trp Ser Leu Phe Leu Tyr Gly Leu Gin 
1 5 10 * ^ is 

Val Ala Ala Pro Ala Phe Ala Ala Asp Gly Arg Ser Thr Arg Tyr Trp 
55 20 25 30 

Asp Cys Cys Lys Pro Ser Cys Ser Trp Pro Asp Lys Ala Pro Val Gly 
35 40 45 

60 Ser Pro Val Gly Thr Cys Asp Ala Gly Aan Ser Pro Leu Gly Asp Pro 
50 55 60 

Leu Ala Lys Ser Gly Cys Glu Gly Gly Pro Ser Tyr Thr Cys Ala Asn 
65 65 70 75 80 

Tyr Gin Pro Trp Ala Val Asn Asp Gin Leu Ala Tyr Gly Phe Ala Ala 
85 90 95 

Thr Ala lie Asn Gly Gly Thr Glu Asp Ser Trp Cys Cys Ala Cys Tyr 
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100 105 110 

Lys Leu Thr Phe Thr Asp Gly Pro Ala Ser Gly Lye Thr Met lie Val 
115 120 125 

5 

Gin Ser Thr Asn Thr Gly Gly Asp Leu Ser Asp Asn His Phe Asp Leu 
130 135 140 

Leu lie Pro Gly Gly Gly Val Gly lie Phe Asp Gly Cys Thr Ser Gin 
10 145 150 155 160 

Tyr Gly Gin Ala Leu Pro Gly Ala Gin Tyr Gly Gly Val Ser Ser Arg 
165 170 175 

15 Ala Glu Cys Asp Gin Met Pro Glu Ala He Lys Ala Gly Cys Gin Trp 
180 185 190 

Arg Tyr Asp Trp Phe Lys Asn Ala Aep Asn Pro Ser Phe Ser Phe Arg 

195 200 205 

20 

Gin Val Gin Cys Pro Ala Glu Leu Val Ala Arg Thr Gly Cys Arg Arg 
210 215 220 

Asn Asp A3p Gly Aen Phe Pro Ala Val Gin He Pro ser Ser Ser Thr 
25 225 230 235 240 

Ser Ser Pro Val Abti Gin Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser 
245 250 255 

3 0 Thr Thr Ser Ser Pro Pro Val Gin Pro Thr Thr Pro Ser Gly Cys Thr 
260 265 270 

Ala Glu Arg Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr 
275 280 285 

35 

Thr Cys Val Ala Gly Ser Thr Cys Thr Lys lie Asn Asp Trp Tyr His 
290 295 300 

Gin Cys Leu * 
40 305 
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PATENT CLAIMS 

1. A method for providing a novel DNA sequence encoding a 
polypeptide from a micro-organism with an activity of interest 

5 comprises the following steps: 

i) PCR amplification of said DNA with PCR primers with homology 
to (a) known gene(s) encoding a polypeptide with an activity of i 
nterest, 

ii) linking the obtained PCR product to a 5* structural gene 
10 sequence and a 3' structural gene sequence, 

iii) expressing said resulting hybrid DNA sequence, 

iv) screening for hybrid DNA sequences encoding a polypeptide 
with said activity of interest or related activity, 

v) isolating the hybrid DNA sequence identified in step iv) 

15 

2. The method according to claim 1 wherein the PCR primers in 
step i) have homology to conserved regions in (a) known 
structural gene(s) or the polypeptide ( s ) thereof. 

20 3. The method according to claim 1 wherein the PCR primers in 
step i) are degenerated on the basis of conserved regions in (a) 
known gene(s) . 

4. The method according to any of claims 1 to 3 wherein the PCR 
2 5 amplification in step i) is performed using naturally occurring 

DNA as template. 

5. The method according to any of claims 1 to 3 wherein the 
microorganism has not been subjected to "in vitro" selection. 

30 

6. The method according to any of claims 1 to 5 wherein the PCR 
amplification in step i) is performed on a sample containing DNA 
from an un- isolated microorganism. 

35 7. The method according to any of claims 1 to 6 wherein the 5» 
and 3» structural gene sequences originate from two different 
structural genes encoding polypeptides having the same activity. 
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8. The method according to any of claims 1 to 7 wherein the 5 1 
structural gene sequence and the 3* structural gene sequence 
originate from the same structural gene sequence* 

9. The method according to any of claims 1 to 8 wherein the 5 9 
structural gene sequence and the 3 • structural gene sequence 
originate from two different structural gene sequences encoding 
polypeptides having different activities. 



10 



10. The method according to any of claims 1 to 9 comprising the 
following steps: 

i) PCR amplification of DNA from micro-organisms with 
PCR primers being homologous to conserved regions of 

15 a known gene encoding a polypeptide with an activity of 
interest, 

ii) cloning the obtained PCR product into a gene encoding 
a polypeptide having the activity of interest, where 

said gene is not identical to the gene from which the PCR 
20 product is obtained, which gene is situated in an 
expression vector, 

iii) transforming said expression vector into a suitable 
host cell, 

iiia) culturing said host cell under suitable conditions, 
2 5 iv) screening for clones comprising a DNA sequence 

originated from the PCR amplification in step i) 
encoding a polypeptide with said activity of 
interest or related activity, 
v) isolating the DNA sequence identified in step iv) . 

30 

11. The method according to claims 1 to 10, wherein the micro- 
organism from which DNA is to be PCR amplified in step i) is a 
prokaryote or an eukaryote. 

35 12. The method according to any of claims 1 to 11, wherein the 
PCR amplification in step i) is performed on DNA from an un- 
cultivable organism. 
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13. The method according to claim 12, wherein the un-cultivable 
organism is an algae, a fungi or a protozoa. 

5 14. The method according to claims 12 and 13, wherein said un- 
cultivable organism is from the group of extremophiles and 
plantonic marine organisms. 

15. The method according to any of claims l to 11, wherein the 
10 PGR amplification in step i) is performed on DNA from a 

cult ivable org ani bid . 

16. The method according to claim 15, wherein said cultivable 
organism is selected from the group of bacteria, fungal 

15 organisms, such as filamentous fungi or yeasts. 

17. The method according to claim 16, wherein said PCR amplifica- 
tion in step i) is performed on one or more polynucleotides 
comprised in a vector, plasmid or the like, such as on a cDNA 

20 library from cultivable organisms. 

18. The method according any of claims 1 to 17, wherein said 
activity of interest is an enzymatic activity. 

25 19. The method according to claim 18, wherein said enzyme 
activity is selected from the group comprising phosphatases 
oxidoreductases, transferases, hydrolases, such as esterases , in 
particular lipases and phytases, such as glucosidases , in 
particular xylanases, cellulases, hemicellulases , and amylases, 

30 such as peptidases, in particular proteases, lyases, isomerases 
and ligases. 

20. The method according to any of claims 10 to 19, wherein said 
host cell mentioned under iii) of claim 10 is a micro-organism, 

35 preferably a yeast or a bacteria. 

21. The method according to claim 20, wherein said host cell is 
a yeast such as a strain of Saccharomyces , in particular 
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Saccharomyces cerevlslae • 

22. The method according to claim 20, wherein said host cell is 
a bacteria such as a strain of Bacillus, in particular of 

5 Bacillus subtilis , or a strain Eschsrlchla coll. 

23. The method according to any of claims 1 to 22, wherein the 
clones/hybrid DNA sequences mentioned in step iv) , are screened 
for enzymatic activity. 

10 

24. The method according to claim 23 , wherein the screened 
clones/hybrid DMA sequences are tested for wash performance. 

25. A novel DNA sequence provided according to any of the method 
15 claims 1 to 24. 

26. A polypeptide with an activity of interest encoded by a DNA 
sequence of claim 25. 
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